How does polling a file for changes work?

How does polling a file for changes work? - python

The problem
I expected the script below to print at most one event and then stop (it's written only to illustrate the problem).
#!/usr/bin/env python
from select import poll, POLLIN
filename = "test.tmp"
# make sure file exists
open(filename, "a").close()
file = open(filename, "r+")
p = poll()
p.register(file.fileno(), POLLIN)
while True:
events = p.poll(100)
for e in events:
print e
# Read data, so that the event goes away?
file.read()
However, it prints about 70000 events per second. Why?
Background
I've written a class that uses the pyudev.Monitor class internally. Amongst other things, it polls the fileno supplied by the fileno() method for changes using a poll object.
Now I'm trying to write an unit test for my class (I realize I'm supposed to write the unit test first, so no need to point it out), and therefore I need to write my own fileno() method for my mock pyudev.Monitor object, and I need to control it so that I can trigger the poll object to report an event. As the above code demonstrates, I can't make it stop reporting seemingly non-existent events!
I can find no acknowledge_event() or similar in the poll class to make the event go away (I suspect there's just one event that's somehow stuck), searching google and this site has yielded nothing. I'm using python 2.6.6 on Ubuntu 10.10.

You'll have better luck using pipes rather than files. Try this instead:
#!/usr/bin/env python
import os
from select import poll, POLLIN
r_fd, w_fd = os.pipe()
p = poll()
p.register(r_fd, POLLIN)
os.write(w_fd, 'X') # Put something in the pipe so p.poll() will return
while True:
events = p.poll(100)
for e in events:
print e
os.read(r_fd, 1)
This will print out the single event you're looking for. To trigger the poll event, all you have to do is write a byte to the writeable file descriptor.

Related

Pythoncom - Passing same COM object to multiple threads

Hello :) I´m a complete beginner when it comes to COM objects, any help is appreciated!
I´m working on a Python program supposed to read incoming MS-Word documents in a client/server fashion, i.e. the client sends a request (one or multiple MS-Word documents) and the server reads specific content from those requests using pythoncom and win32com.
Because I want to minimize waiting time for the client (client needs a status message from server, I do not want to open an MS-Word instance for every request. Hence, I intend to have a pool of running MS-Word instances from which the server can pick and choose. This, in turn, means I have to reuse those instances from the pool in different threads and this is what causes trouble right now. After I read Using win32com with multithreading, my dummy code for the server looks like this:
import pythoncom, win32com.client, threading, psutil, os, queue, time, datetime
appPool = {'WINWORD.EXE': queue.Queue()}
def initAppPool():
global appPool
wordApp = win32com.client.DispatchEx('Word.Application')
appPool["WINWORD.EXE"].put(wordApp) # For testing purpose I only use one MS-Word instance currently
def run_in_thread(appid, path):
#open doc, read do some stuff, close it and reattach MS-Word instance to pool
pythoncom.CoInitialize()
wordApp = win32com.client.Dispatch(pythoncom.CoGetInterfaceAndReleaseStream(appid, pythoncom.IID_IDispatch))
doc = wordApp.Documents.Open(path)
time.sleep(3) # read out some content ...
doc.Close()
appPool["WINWORD.EXE"].put(wordApp)
if __name__ == '__main__':
initAppPool()
pathOfFile2BeRead1 = r'C:\Temp\file4.docx'
pathOfFile2BeRead2 = r'C:\Temp\file5.doc'
#treat first request
wordApp = appPool["WINWORD.EXE"].get(True, 10)
pythoncom.CoInitialize()
wordApp_id = pythoncom.CoMarshalInterThreadInterfaceInStream(pythoncom.IID_IDispatch, wordApp)
readDocjob1 = threading.Thread(target=run_in_thread,args=(wordApp_id,pathOfFile2BeRead1), daemon=True)
readDocjob1.start()
#wait here until readDocjob1 is done
wait = True
while wait:
try:
wordApp = appPool["WINWORD.EXE"].get(True, 1)
wait = False
except queue.Empty:
print(f"[{datetime.datetime.now()}] error: appPool empty")
except BaseException as err:
print(f"[{datetime.datetime.now()}] error: {err}")
So far everything works as expected, but when I start a second request similar to the first one:
(x) wordApp_id = pythoncom.CoMarshalInterThreadInterfaceInStream(pythoncom.IID_IDispatch, wordApp)
readDocjob2 = threading.Thread(target=run_in_thread,args=(wordApp_id,pathOfFile2BeRead2), daemon=True)
readDocjob2.start()
I receive the following error message: "The application called an interface that was marshaled for a different thread" for the (x) marked line.
I thought that is why I have to use pythoncom.CoGetInterfaceAndReleaseStream to jump between threads with the same COM object? And besides that why does it work the first time but not the second time?
I searched for different solutions on StackOverflow which use CoMarshalInterface instead of CoMarshalInterThreadInterfaceInStream, but they all gave me the same error. I´m really confused right now.
EDIT:
After fixing the error as mentioned in the comments, I ran into a mysterious behavior.
When the second job is executed:
wordApp_id = pythoncom.CoMarshalInterThreadInterfaceInStream(pythoncom.IID_IDispatch, wordApp)
readDocjob2 = threading.Thread(target=run_in_thread,args=(wordApp_id,pathOfFile2BeRead2), daemon=True)
readDocjob2.start()
The function run_in_thread terminates immediately without executing any line, respectively it seems that the pythoncom.CoInitialize() is not working properly.
The script finishes without any error messages though.
def run_in_thread(instance,appid, path):
#open doc, read do some stuff, close it and reattach MS-Word instance to pool
pythoncom.CoInitialize()
wordApp = win32com.client.Dispatch(pythoncom.CoGetInterfaceAndReleaseStream(appid, pythoncom.IID_IDispatch))
doc = wordApp.Documents.Open(path)
time.sleep(3) # read out some content ...
doc.Close()
instance.flag = True

What happens is you put back in the "activePool" a COM reference that you got from CoGetInterfaceAndReleaseStream. But this reference was created specially for this new thread and then you call CoMarshalInterThreadInterfaceInStream on this new reference.
That's what is wrong.
You must always use the original COM reference you got from the thread that created it, to be able to call CoMarshalInterThreadInterfaceInStream repeatedly.
So, to solve the problem, you must change how your apppool works, use some kind of a "in use" flag but don't touch the original COM reference.

How to make a python script stopable from another script?

TL;DR: If you have a program that should run for an undetermined amount of time, how do you code something to stop it when the user decide it is time? (Without KeyboardInterrupt or killing the task)
--
I've recently posted this question: How to make my code stopable? (Not killing/interrupting)
The answers did address my question, but from a termination/interruption point of view, and that's not really what I wanted. (Although, my question didn't made that clear)
So, I'm rephrasing it.
I created a generic script for example purposes. So I have this class, that gathers data from a generic API and write the data into a csv. The code is started by typing python main.py on a terminal window.
import time,csv
import GenericAPI
class GenericDataCollector:
def __init__(self):
self.generic_api = GenericAPI()
self.loop_control = True
def collect_data(self):
while self.loop_control: #Can this var be changed from outside of the class? (Maybe one solution)
data = self.generic_api.fetch_data() #Returns a JSON with some data
self.write_on_csv(data)
time.sleep(1)
def write_on_csv(self, data):
with open('file.csv','wt') as f:
writer = csv.writer(f)
writer.writerow(data)
def run():
obj = GenericDataCollector()
obj.collect_data()
if __name__ == "__main__":
run()
The script is supposed to run forever OR until I command it to stop. I know I can just KeyboardInterrupt (Ctrl+C) or abruptly kill the task. That isn't what I'm looking for. I want a "soft" way to tell the script it's time to stop, not only because interruption can be unpredictable, but it's also a harsh way to stop.
If that script was running on a docker container (for example) you wouldn't be able to Ctrl+C unless you happen to be in the terminal/bash inside the docker.
Or another situation: If that script was made for a customer, I don't think it's ok to tell the customer, just use Ctrl+C/kill the task to stop it. Definitely counterintuitive, especially if it's a non tech person.
I'm looking for way to code another script (assuming that's a possible solution) that would change to False the attribute obj.loop_control, finishing the loop once it's completed. Something that could be run by typing on a (different) terminal python stop_script.py.
It doesn't, necessarily, needs to be this way. Other solutions are also acceptable, as long it doesn't involve KeyboardInterrupt or Killing tasks. If I could use a method inside the class, that would be great, as long I can call it from another terminal/script.
Is there a way to do this?
If you have a program that should run for an undetermined amount of time, how do you code something to stop it when the user decide it is time?

In general, there are two main ways of doing this (as far as I can see). The first one would be to make your script check some condition that can be modified from outside (like the existence or the content of some file/socket). Or as #Green Cloak Guy stated, using pipes which is one form of interprocess communication.
The second one would be to use the built in mechanism for interprocess communication called signals that exists in every OS where python runs. When the user presses Ctrl+C the terminal sends a specific signal to the process in the foreground. But you can send the same (or another) signal programmatically (i.e. from another script).
Reading the answers to your other question I would say that what is missing to address this one is a way to send the appropriate signal to your already running process. Essentially this can be done by using the os.kill() function. Note that although the function is called 'kill' it can send any signal (not only SIGKILL).
In order for this to work you need to have the process id of the running process. A commonly used way of knowing this is making your script save its process id when it launches into a file stored in a common location. To get the current process id you can use the os.getpid() function.
So summarizing I'd say that the steps to achieve what you want would be:
Modify your current script to store its process id (obtainable by using os.getpid()) into a file in a common location, for example /tmp/myscript.pid. Note that if you want your script to be protable you will need to address this in a way that works in non-unix like OSs like Windows.
Choose one signal (typically SIGINT or SIGSTOP or SIGTERM) and modify your script to register a custom handler using signal.signal() that addresses the graceful termination of your script.
Create another (note that it could be the same script with some command line paramater) script that reads the process id from the known file (aka /tmp/myscript.pid) and sends the chosen signal to that process using os.kill().
Note that an advantage of using signals to achieve this instead of an external way (files, pipes, etc.) is that the user can still press Ctrl+C (if you chose SIGINT) and that will produce the same behavior as the 'stop script' would.

What you're really looking for is any way to send a signal from one program to another, independent, program. One way to do this would be to use an inter-process pipe. Python has a module for this (which does, admittedly, seem to require a POSIX-compliant shell, but most major operating systems should provide that).
What you'll have to do is agree on a filepath beforehand between your running-program (let's say main.py) and your stopping-program (let's say stop.sh). Then you might make the main program run until someone inputs something to that pipe:
import pipes
...
t = pipes.Template()
# create a pipe in the first place
t.open("/tmp/pipefile", "w")
# create a lasting pipe to read from that
pipefile = t.open("/tmp/pipefile", "r")
...
And now, inside your program, change your loop condition to "as long as there's no input from this file - unless someone writes something to it, .read() will return an empty string:
while not pipefile.read():
# do stuff
To stop it, you put another file or script or something that will write to that file. This is easiest to do with a shell script:
#!/usr/bin/env sh
echo STOP >> /tmp/pipefile
which, if you're containerizing this, you could put in /usr/bin and name it stop, give it at least 0111 permissions, and tell your user "to stop the program, just do docker exec containername stop".
(using >> instead of > is important because we just want to append to the pipe, not to overwrite it).
Proof of concept on my python console:
>>> import pipes
>>> t = pipes.Template()
>>> t.open("/tmp/file1", "w")
<_io.TextIOWrapper name='/tmp/file1' mode='w' encoding='UTF-8'>
>>> pipefile = t.open("/tmp/file1", "r")
>>> i = 0
>>> while not pipefile.read():
... i += 1
...
At this point I go to a different terminal tab and do
$ echo "Stop" >> /tmp/file1
then I go back to my python tab, and the while loop is no longer executing, so I can check what happened to i while I was gone.
>>> print(i)
1704312

How to keep a While True loop running with raw_input() if inputs are seldom?

I'm currently working on a project where I need to send data via Serial persistently but need to occasionally change that data based in new inputs. My issue is that my current loop only functions exactly when a new input is offered by raw_input(). Nothing runs again until another raw_input() is received.
My current (very slimmed down) loop looks like this:
while True:
foo = raw_input()
print(foo)
I would like for the latest values to be printed (or passed to another function) constantly regardless of how often changes occur.
Any help is appreciated.

The select (or in Python 3.4+, selectors) module can allow you to solve this without threading, while still performing periodic updates.
Basically, you just write the normal loop but use select to determine if new input is available, and if so, grab it:
import select
while True:
# Polls for availability of data on stdin without blocking
if select.select((sys.stdin,), (), (), 0)[0]:
foo = raw_input()
print(foo)
As written, this would print far more than you probably want; you could either time.sleep after each print, or change the timeout argument to select.select to something other than 0; if you make it 1 for instance, then you'll update immediately when new data is available, otherwise, you'll wait a second before giving up and printing the old data again.

How will you type in your data at the same time while data is being printed?
However, you can use multithreading if you make sure your source of data doesn't interfere with your output of data.
import thread
def give_output():
while True:
pass # output stuff here
def get_input():
while True:
pass # get input here
thread.start_new_thread(give_output, ())
thread.start_new_thread(get_input, ())
Your source of data could be another program. You could connect them using a file or a socket.

enable a script to stop itself when a command is issued from terminal

I have a script runReports.py that is executed every night. Suppose for some reason the script takes too long to execute, I want to be able to stop it from terminal by issuing a command like ./runReports.py stop.
I tried to implement this by having the script to create a temporary file when the stop command is issued.
The script checks for existence of this file before running each report.
If the file is there the script stops executing, else it continues.
But I am not able to find a way to make the issuer of the stop command aware that the script has stopped successfully. Something along the following lines:
$ ./runReports.py stop
Stopping runReports...
runReports.py stopped successfully.
How to achieve this?

For example if your script runs in loop, you can catch signal http://en.wikipedia.org/wiki/Unix_signal and terminate process:
import signal
class SimpleReport(BaseReport):
def __init__(self):
...
is_running = True
def _signal_handler(self, signum, frame):
is_running = False
def run(self):
signal.signal(signal.SIGUSR1, self._signal_handler) # set signal handler
...
while is_running:
print("Preparing report")
print("Exiting ...")
To terminate process just call kill -SIGUSR1 procId

You want to achieve inter process communication. You should first explore the different ways to do that : system V IPC (memory, very versatile, possibly baffling API), sockets (including unix domain sockets)(memory, more limited, clean API), file system (persistent on disk, almost architecture independent), and choose yours.
As you are asking about files, there are still two ways to communicate using files : either using file content (feature rich, harder to implement), or simply file presence. But the problem using files, is that is a program terminates because of an error, it may not be able to write its ended status on the disk.
IMHO, you should clearly define what are your requirements before choosing file system based communication (testing the end of a program is not really what it is best at) unless you also need architecture independence.
To directly answer your question, the only reliable way to know if a program has ended if you use file system communication is to browse the list of currently active processes, and the simplest way is IMHO to use ps -e in a subprocess.

Instead of having a temporary file, you could have a permanent file(config.txt) that has some tags in it and check if the tag 'running = True'.
To achieve this is quiet simple, if your code has a loop in it (I imagine it does), just make a function/method that branches a check condition on this file.
def continue_running():
with open("config.txt") as f:
for line in f:
tag, condition = line.split(" = ")
if tag == "running" and condition == "True":
return True
return False
In your script you will do this:
while True: # or your terminal condition
if continue_running():
# your regular code goes here
else:
break
So all you have to do to stop the loop in the script is change the 'running' to anything but "True".

File Lock in Python [duplicate]

I need to lock a file for writing in Python. It will be accessed from multiple Python processes at once. I have found some solutions online, but most fail for my purposes as they are often only Unix based or Windows based.

Alright, so I ended up going with the code I wrote here, on my website link is dead, view on archive.org (also available on GitHub). I can use it in the following fashion:
from filelock import FileLock
with FileLock("myfile.txt"):
# work with the file as it is now locked
print("Lock acquired.")

The other solutions cite a lot of external code bases. If you would prefer to do it yourself, here is some code for a cross-platform solution that uses the respective file locking tools on Linux / DOS systems.
try:
# Posix based file locking (Linux, Ubuntu, MacOS, etc.)
# Only allows locking on writable files, might cause
# strange results for reading.
import fcntl, os
def lock_file(f):
if f.writable(): fcntl.lockf(f, fcntl.LOCK_EX)
def unlock_file(f):
if f.writable(): fcntl.lockf(f, fcntl.LOCK_UN)
except ModuleNotFoundError:
# Windows file locking
import msvcrt, os
def file_size(f):
return os.path.getsize( os.path.realpath(f.name) )
def lock_file(f):
msvcrt.locking(f.fileno(), msvcrt.LK_RLCK, file_size(f))
def unlock_file(f):
msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, file_size(f))
# Class for ensuring that all file operations are atomic, treat
# initialization like a standard call to 'open' that happens to be atomic.
# This file opener *must* be used in a "with" block.
class AtomicOpen:
# Open the file with arguments provided by user. Then acquire
# a lock on that file object (WARNING: Advisory locking).
def __init__(self, path, *args, **kwargs):
# Open the file and acquire a lock on the file before operating
self.file = open(path,*args, **kwargs)
# Lock the opened file
lock_file(self.file)
# Return the opened file object (knowing a lock has been obtained).
def __enter__(self, *args, **kwargs): return self.file
# Unlock the file and close the file object.
def __exit__(self, exc_type=None, exc_value=None, traceback=None):
# Flush to make sure all buffered contents are written to file.
self.file.flush()
os.fsync(self.file.fileno())
# Release the lock on the file.
unlock_file(self.file)
self.file.close()
# Handle exceptions that may have come up during execution, by
# default any exceptions are raised to the user.
if (exc_type != None): return False
else: return True
Now, AtomicOpen can be used in a with block where one would normally use an open statement.
WARNINGS:
If running on Windows and Python crashes before exit is called, I'm not sure what the lock behavior would be.
The locking provided here is advisory, not absolute. All potentially competing processes must use the "AtomicOpen" class.
As of (Nov 9th, 2020) this code only locks writable files on Posix systems. At some point after the posting and before this date, it became illegal to use the fcntl.lock on read-only files.

There is a cross-platform file locking module here: Portalocker
Although as Kevin says, writing to a file from multiple processes at once is something you want to avoid if at all possible.
If you can shoehorn your problem into a database, you could use SQLite. It supports concurrent access and handles its own locking.

I have been looking at several solutions to do that and my choice has been
oslo.concurrency
It's powerful and relatively well documented. It's based on fasteners.
Other solutions:
Portalocker: requires pywin32, which is an exe installation, so not possible via pip
fasteners: poorly documented
lockfile: deprecated
flufl.lock: NFS-safe file locking for POSIX systems.
simpleflock : Last update 2013-07
zc.lockfile : Last update 2016-06 (as of 2017-03)
lock_file : Last update in 2007-10

I prefer lockfile — Platform-independent file locking

Locking is platform and device specific, but generally, you have a few options:
Use flock(), or equivalent (if your os supports it). This is advisory locking, unless you check for the lock, it's ignored.
Use a lock-copy-move-unlock methodology, where you copy the file, write the new data, then move it (move, not copy - move is an atomic operation in Linux -- check your OS), and you check for the existence of the lock file.
Use a directory as a "lock". This is necessary if you're writing to NFS, since NFS doesn't support flock().
There's also the possibility of using shared memory between the processes, but I've never tried that; it's very OS-specific.
For all these methods, you'll have to use a spin-lock (retry-after-failure) technique for acquiring and testing the lock. This does leave a small window for mis-synchronization, but its generally small enough to not be a major issue.
If you're looking for a solution that is cross platform, then you're better off logging to another system via some other mechanism (the next best thing is the NFS technique above).
Note that sqlite is subject to the same constraints over NFS that normal files are, so you can't write to an sqlite database on a network share and get synchronization for free.

Coordinating access to a single file at the OS level is fraught with all kinds of issues that you probably don't want to solve.
Your best bet is have a separate process that coordinates read/write access to that file.

Here's an example of how to use the filelock library, which is similar to Evan Fossmark's implementation:
from filelock import FileLock
lockfile = r"c:\scr.txt"
lock = FileLock(lockfile + ".lock")
with lock:
file = open(path, "w")
file.write("123")
file.close()
Any code within the with lock: block is thread-safe, meaning that it will be finished before another process has access to the file.

Locking a file is usually a platform-specific operation, so you may need to allow for the possibility of running on different operating systems. For example:
import os
def my_lock(f):
if os.name == "posix":
# Unix or OS X specific locking here
elif os.name == "nt":
# Windows specific locking here
else:
print "Unknown operating system, lock unavailable"

I have been working on a situation like this where I run multiple copies of the same program from within the same directory/folder and logging errors. My approach was to write a "lock file" to the disc before opening the log file. The program checks for the presence of the "lock file" before proceeding, and waits for its turn if the "lock file" exists.
Here is the code:
def errlogger(error):
while True:
if not exists('errloglock'):
lock = open('errloglock', 'w')
if exists('errorlog'): log = open('errorlog', 'a')
else: log = open('errorlog', 'w')
log.write(str(datetime.utcnow())[0:-7] + ' ' + error + '\n')
log.close()
remove('errloglock')
return
else:
check = stat('errloglock')
if time() - check.st_ctime > 0.01: remove('errloglock')
print('waiting my turn')
EDIT---
After thinking over some of the comments about stale locks above I edited the code to add a check for staleness of the "lock file." Timing several thousand iterations of this function on my system gave and average of 0.002066... seconds from just before:
lock = open('errloglock', 'w')
to just after:
remove('errloglock')
so I figured I will start with 5 times that amount to indicate staleness and monitor the situation for problems.
Also, as I was working with the timing, I realized that I had a bit of code that was not really necessary:
lock.close()
which I had immediately following the open statement, so I have removed it in this edit.

this worked for me:
Do not occupy large files, distribute in several small ones
you create file Temp, delete file A and then rename file Temp to A.
import os
import json
def Server():
i = 0
while i == 0:
try:
with open(File_Temp, "w") as file:
json.dump(DATA, file, indent=2)
if os.path.exists(File_A):
os.remove(File_A)
os.rename(File_Temp, File_A)
i = 1
except OSError as e:
print ("file locked: " ,str(e))
time.sleep(1)
def Clients():
i = 0
while i == 0:
try:
if os.path.exists(File_A):
with open(File_A,"r") as file:
DATA_Temp = file.read()
DATA = json.loads(DATA_Temp)
i = 1
except OSError as e:
print (str(e))
time.sleep(1)

The scenario is like that:
The user requests a file to do something. Then, if the user sends the same request again, it informs the user that the second request is not done until the first request finishes. That's why, I use lock-mechanism to handle this issue.
Here is my working code:
from lockfile import LockFile
lock = LockFile(lock_file_path)
status = ""
if not lock.is_locked():
lock.acquire()
status = lock.path + ' is locked.'
print status
else:
status = lock.path + " is already locked."
print status
return status

I found a simple and worked(!) implementation from grizzled-python.
Simple use os.open(..., O_EXCL) + os.close() didn't work on windows.

You may find pylocker very useful. It can be used to lock a file or for locking mechanisms in general and can be accessed from multiple Python processes at once.
If you simply want to lock a file here's how it works:
import uuid
from pylocker import Locker
# create a unique lock pass. This can be any string.
lpass = str(uuid.uuid1())
# create locker instance.
FL = Locker(filePath='myfile.txt', lockPass=lpass, mode='w')
# aquire the lock
with FL as r:
# get the result
acquired, code, fd = r
# check if aquired.
if fd is not None:
print fd
fd.write("I have succesfuly aquired the lock !")
# no need to release anything or to close the file descriptor,
# with statement takes care of that. let's print fd and verify that.
print fd

If you just need Mac/POSIX this should work without external packages.
import sys
import stat
import os
filePath = "<PATH TO FILE>"
if sys.platform == 'darwin':
flags = os.stat(filePath).st_flags
if flags & ~stat.UF_IMMUTABLE:
os.chflags(filePath, flags & stat.UF_IMMUTABLE)
and if you want to unlock a file just change,
if flags & stat.UF_IMMUTABLE:
os.chflags(filePath, flags & ~stat.UF_IMMUTABLE)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.