Python - How to check if a file is used by another application?

Python - How to check if a file is used by another application? - python

I want to open a file which is periodically written to by another application. This application cannot be modified. I'd therefore like to only open the file when I know it is not been written to by an other application.
Is there a pythonic way to do this? Otherwise, how do I achieve this in Unix and Windows?
edit: I'll try and clarify. Is there a way to check if the current file has been opened by another application?
I'd like to start with this question. Whether those other application read/write is irrelevant for now.
I realize it is probably OS dependent, so this may not really be python related right now.

Will your python script desire to open the file for writing or for reading? Is the legacy application opening and closing the file between writes, or does it keep it open?
It is extremely important that we understand what the legacy application is doing, and what your python script is attempting to achieve.
This area of functionality is highly OS-dependent, and the fact that you have no control over the legacy application only makes things harder unfortunately. Whether there is a pythonic or non-pythonic way of doing this will probably be the least of your concerns - the hard question will be whether what you are trying to achieve will be possible at all.
UPDATE
OK, so knowing (from your comment) that:
the legacy application is opening and
closing the file every X minutes, but
I do not want to assume that at t =
t_0 + n*X + eps it already closed
the file.
then the problem's parameters are changed. It can actually be done in an OS-independent way given a few assumptions, or as a combination of OS-dependent and OS-independent techniques. :)
OS-independent way: if it is safe to assume that the legacy application keeps the file open for at most some known quantity of time, say T seconds (e.g. opens the file, performs one write, then closes the file), and re-opens it more or less every X seconds, where X is larger than 2*T.
stat the file
subtract file's modification time from now(), yielding D
if T <= D < X then open the file and do what you need with it
This may be safe enough for your application. Safety increases as T/X decreases. On *nix you may have to double check /etc/ntpd.conf for proper time-stepping vs. slew configuration (see tinker). For Windows see MSDN
Windows: in addition (or in-lieu) of the OS-independent method above, you may attempt to use either:
sharing (locking): this assumes that the legacy program also opens the file in shared mode (usually the default in Windows apps); moreover, if your application acquires the lock just as the legacy application is attempting the same (race condition), the legacy application will fail.
this is extremely intrusive and error prone. Unless both the new application and the legacy application need synchronized access for writing to the same file and you are willing to handle the possibility of the legacy application being denied opening of the file, do not use this method.
attempting to find out what files are open in the legacy application, using the same techniques as ProcessExplorer (the equivalent of *nix's lsof)
you are even more vulnerable to race conditions than the OS-independent technique
Linux/etc.: in addition (or in-lieu) of the OS-independent method above, you may attempt to use the same technique as lsof or, on some systems, simply check which file the symbolic link /proc/<pid>/fd/<fdes> points to
you are even more vulnerable to race conditions than the OS-independent technique
it is highly unlikely that the legacy application uses locking, but if it is, locking is not a real option unless the legacy application can handle a locked file gracefully (by blocking, not by failing - and if your own application can guarantee that the file will not remain locked, blocking the legacy application for extender periods of time.)
UPDATE 2
If favouring the "check whether the legacy application has the file open" (intrusive approach prone to race conditions) then you can solve the said race condition by:
checking whether the legacy application has the file open (a la lsof or ProcessExplorer)
suspending the legacy application process
repeating the check in step 1 to confirm that the legacy application did not open the file between steps 1 and 2; delay and restart at step 1 if so, otherwise proceed to step 4
doing your business on the file -- ideally simply renaming it for subsequent, independent processing in order to keep the legacy application suspended for a minimal amount of time
resuming the legacy application process

Unix does not have file locking as a default. The best suggestion I have for a Unix environment would be to look at the sources for the lsof command. It has deep knowledge about which process have which files open. You could use that as the basis of your solution. Here are the Ubuntu sources for lsof.

One thing I've done is have python very temporarily rename the file. If we're able to rename it, then no other process is using it. I only tested this on Windows.

Related

Concurrent access to one file by several unrelated processes on macOS

I need to get several processes to communicate with each others on a macOS system. These processes will be spawned several times per day at different times, and I cannot predict when they will be up at the same time (if ever). These programs are in python or swift.
How can I safely allow these programs to all write to the same file?
I have explored a few different options:
I thought of using sqlite3, however I couldn't find an answer in the documentation on whether it was safe to write concurrently across processes. This question is not very definitive, old, and I would ideally like to get a more authoritative answer.
I thought of using multiprocesing as it supports locks. However, as far as I could see in the documentation, you need a meta-process that spawns the children and stays up for the duration of the longest child process. I am am fine having a meta-spawner process, but it feels wasteful to have a meta-process basically staying up all day long, just to resolve conflicting access ?
Along the lines of this, I thought of having a process that stays up all day long, and receive messages from all other processes, and is the sole responsible for writing to file. It feels a little wasteful, how worried should I be about the resource cost of having a program up all day, and doing little? Are the only thing to worry about memory footprint and CPU usage (as shown in activity monitor), or could there be other significant costs, e.g. context switching?
I have come across flock on linux, that seems to be a locking mechanism to access files, provided by the OS. This seems like a good solution, but this does not seem to exist on macOS?
Any idea to solve this requirement in a robust manner (so that I don't have to debug every other day - I now concurrency can be a pain), is most welcome!

While you are in control of the source code of all such processes, you could use flock. It will put the advisory lock on file, so the other writer will be blocked only in case he is also access the file the same way. This is OK for you, if only your processes will ever need to write to the shared file.
I've tested flock on BigSur, it is still implemented and works fine.
You can also do it in any other common manner: create temporary .lock file in the known location(this is what git does), and remove it after current writer is done with the main file; use semaphores; etc

Python program on server - control via browser

I have to setup a program which reads in some parameters from a widget/gui, calculates some stuff based on database values and the input, and finally sends some ascii files via ftp to remote servers.
In general, I would suggest a python program to do the tasks. Write a Qt widget as a gui (interactively changing views, putting numbers into tables, setting up check boxes, switching between various layers - never done something as complex in python, but some experience in IDL with event handling etc), set up data classes that have unctions, both to create the ascii files with the given convention, and to send the files via ftp to some remote server.
However, since my company is a bunch of Windows users, each sitting at their personal desktop, installing python and all necessary libraries on each individual machine would be a pain in the ass.
In addition, in a future version the program is supposed to become smart and do some optimization 24/7. Therefore, it makes sense to put it to a server. As I personally rather use Linux, the server is already set up using Ubuntu server.
The idea is now to run my application on the server. But how can the users access and control the program?
The easiest way for everybody to access something like a common control panel would be a browser I guess. I have to make sure only one person at a time is sending signals to the same units at a time, but that should be doable via flags in the database.
After some google-ing, next to QtWebKit, django seems to the first choice for such a task. But...
Can I run a full fledged python program underneath my web application? Is django the right tool to do so?
As mentioned previously, in the (intermediate) future ( ~1 year), we might have to implement some computational expensive tasks. Is it then also possible to utilize C as it is within normal python?
Another question I have is on the development. In order to become productive, we have to advance in small steps. Can I first create regular python classes, which later on can be imported to my web application? (Same question applies for widgets / QT?)
Finally: Is there a better way to go? Any standards, any references?

Django is a good candidate for the website, however:
It is not a good idea to run heavy functionality from a website. it should happen in a separate process.
All functions should be asynchronous, I.E. You should never wait for something to complete.
I would personally recommend writing a separate process with a message queue and the website would only ask that process for statuses and always display a result immediatly to the user
You can use ajax so that the browser will always have the latest result.
ZeroMQ or Celery are useful for implementing the functionality.
You can implement functionality in C pretty easily. I recomment however that you write that functionality as pure c with a SWIG wrapper rather that writing it as an extension module for python. That way the functionality will be portable and not dependent on the python website.

python and another program writing to the same file

I have noticed that python always remembers where it finished writing in a file and continues from that point.
Is there a way to reset that so that if the files is edited by another program that removes certain text and ads another python will not fill the gaps with NULL when it does next write?
I have the file open in the parent and the threading children are writing to it. I used flush to ensure after write the data is physically written to the file, but that is only good to do that.
Is there another function I seem to miss that will make python append properly?

One safe thing, OS independent, and reliable is certainly to close the file, and open it again on writting.
If the performance hindrance due to that is unacceptable, you could try to use "seek" to move to the end of file before writing. I just did some naive testing in the interactive console, and indeed, using file.seek(0, os.SEEK_END) before writing worked.
Not that I think having two processes writing to the same file could be safe under most circumstances -- you will end up in race conditions of some sort doing this. One way around is to implement file-locks, so that one process just write to the file after acquiring the lock. Having this done in the right wya may be thought. So, ceck if your application wpould not be in better place using something to written data carefully built and hardened along the years to allow simultanous update by various processes, like an SQL engine (MySQL or Postgresql).

Are there a modules for temporarily backup and restore text files in Python

I need to modify a text file at runtime but restore its original state later (even if the computer crash).
My program runs in regular sessions. Once a session ended, the original state of that file can be changed, but the original state won't change at runtime.
There are several instances of this text file with the same name in several directories. My program runs in each directory (but not in parallel), but depending on the directory content's it does different things. The order of choosing a working directory like this is completely arbitrary.
Since the file's name is the same in each directory, it seems a good idea to store the backed up file in slightly different places (ie. the parent directory name could be appended to the backup target path).
What I do now is backup and restore the file with a self-written class, and also check at startup if the previous backup for the current directory was properly restored.
But my implementation needs serious refactoring, and now I'm interested if there are libraries already implemented for this kind of task.
edit
version control seems like a good idea, but actually it's a bit overkill since it requires network connection and often a server. Other VCS need clients to be installed. I would be happier with a pure-python solution, but at least it should be cross-platform, portable and small enough (<10mb for example).

Why not just do what every unix , mac , window file has done for years -- create a lockfile/working file concept.
When a file is selected for edit:
Check to see if there is an active lock or a crashed backup.
If the file is locked or crashed, give a "recover" option
Otherwise, begin editing the file...
The editing tends to do one or more of a few things:
Copy the original file into a ".%(filename)s.backup"
Create a ".%(filename)s.lock" to prevent others from working on it
When editing is achieved, the lock goes away and the .backup is removed
Sometimes things are slightly reversed, and the original stays in place while a .backup is the active edit; on success the .backup replaces the original
If you crash vi or some other text programs on a linux box, you'll see these files created . note that they usually have a dot(.) prefix so they're normally hidden on the command line. Word/Powerpoint/etc all do similar things.

Implement Version control ... like svn (see pysvn) it should be fast as long as the repo is on the same server... and allows rollbacks to any version of the file... maybe overkill but that will make everything reversible
http://pysvn.tigris.org/docs/pysvn_prog_guide.html
You dont need a server ... you can have local version control and it should be fine...

Git, Subversion or Mercurial is your friend.

Lock down a program so it has no access to outside files, like a virus scanner does

I would like to launch an untrusted application programmatically, so I want to remove the program's ability to access files, network, etc. Essentially, I want to restrict it so its only interface to the rest of the computer is stdin and stdout.
Can I do that? Preferably in a cross-platform way but I sort of expect to have to do it differently for each OS. I'm using Python, but I'm willing to write this part in a lower level or more platform integrated language if necessary.
The reason I need to do this is to write a distributed computing infrastructure. It needs to download a program, execute it, piping data to stdin, and returning data that it receives on stdout to the central server. But since the program it downloads is untrusted, I want to restrict it to only using stdin and stdout.

The short answer is no.
The long answer is not really. Consider a C program, in which the program opens a log file by grabbing the next available file descriptor. Your program, in order to stop this, would need to somehow monitor this, and block it. Depending on the robustness of the untrusted program, this could cause a fatal crash, or inhibit harmless functionality. There are many other similar issues to this one that make what you are trying to do hard.
I would recommend looking into sandboxing solutions already available. In particular, a virtual machine can be very useful for testing out untrusted code. If you can't find anything that meets your needs, your best bet is to probably deal with this at the kernel level, or with something a bit closer to the hardware such as C.

Yes, you can do this. You can run an inferior process through ptrace (essentially you act as a debugger) and you hook on system calls and determine whether they should be allowed or not.
codepad.org does this for instance, see: about codepad. It uses the geordi supervisor to execute the untrusted code.

You can run untrusted apps in chroot and block them from using network with an iptables rule (for example, owner --uid-owner match)
But really, virtual machine is more reliable and on modern hardware performance impact is negligible.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.