Is closing file descriptor and removing inotify watch really necessary? - python

With python inotifyx, do I have to remove watch and close opened system file descriptor if I need them until program exit? E.g. is there some possible problems if I create one (file descriptor + watch) with each run and don't close it?

It's always a good idea to release resources (e.g. free memory, close file descriptors, waitpid(2) on child processes, etc) whenever you're done using them. Being lazy and letting the operating system take care of it for you when you exit is a sure way to cause bugs in the future.

The kernel stores watches as full paths, so closing the watch is preferable, it also takes unnecessary work off of VFS. As for the file descriptor, that would depend on how many others you had opened.
Kind of like a phone call, its nice to tell the other party that you have stopped listening, hanging up the phone is optional, but conventional. If you need it for something, keep it.

Related

Do you need to call close if you already flush to make content survives unexpected reboot?

I'm writing to a file and I call flush() after I'm done. Is it safe to force quit the program say pulling the power plug without calling close() on the file? Is calling flush() sufficient that the file won't become corrupt.
To ensure that content is flushed to the operating system (and is visible to other running applications), either flush() or close() will suffice (which is to say, you don't need both).
To ensure that content is flushed to disk, you also need to add os.fsync() or os.fdatasync(). Do note that in the case of a newly-created (or newly-renamed) file, you need to worry about whether the directory was flushed as well.
By the way -- if you care about being really sure things get to disk, the classic presentation Eat My Data is worth your time and attention.

Danger in corrupting file during unexpected shutdown

I am logging data via:
with open('filename.txt','a') as fid:
fid.write(line_of_data)
Granted, the amount of time the file is open is short for each write, but I will write data every second making it extremely repetitive. Since this is being used on a remote system there is always the chance that power will be interrupted causing the computer to shutdown. If power is cut in the middle of a fid.write() will the whole file become corrupt, or, since it was opened to "append" will only the last line be lost?
It actually depends on the filesystem and operating system. When you "write" to a file it may not really means write to actual harddrive - it may be buffered by an OS, for example, and never actually "make it" to the hard drive itself.
You should not assume anything in that case other than everything may happen.
If you need some form of persistent writing - you probably need to use specialized libraries which may add required layers of security

Why should I close files in Python? [duplicate]

This question already has answers here:
Is explicitly closing files important?
(7 answers)
Is close() necessary when using iterator on a Python file object [duplicate]
(8 answers)
Closed 8 years ago.
Usually when I open files I never call the close() method, and nothing bad happens. But I've been told this is bad practice. Why is that?
For the most part, not closing files is a bad idea, for the following reasons:
It puts your program in the garbage collectors hands - though the file in theory will be auto closed, it may not be closed. Python 3 and Cpython generally do a pretty good job at garbage collecting, but not always, and other variants generally suck at it.
It can slow down your program. Too many things open, and thus more used space in the RAM, will impact performance.
For the most part, many changes to files in python do not go into effect until after the file is closed, so if your script edits, leaves open, and reads a file, it won't see the edits.
You could, theoretically, run in to limits of how many files you can have open.
As #sai stated below, Windows treats open files as locked, so legit things like AV scanners or other python scripts can't read the file.
It is sloppy programming (then again, I'm not exactly the best at remembering to close files myself!)
Found some good answers:
(1) It is a matter of good programming practice. If you don't close
them yourself, Python will eventually close them for you. In some
versions of Python, that might be the instant they are no longer
being used; in others, it might not happen for a long time. Under
some circumstances, it might not happen at all.
(2) When writing to a file, the data may not be written to disk until
the file is closed. When you say "output.write(...)", the data is
often cached in memory and doesn't hit the hard drive until the file
is closed. The longer you keep the file open, the greater the
chance that you will lose data.
(3) Since your operating system has strict limits on how many file
handles can be kept open at any one instant, it is best to get into
the habit of closing them when they aren't needed and not wait for
"maid service" to clean up after you.
(4) Also, some operating systems (Windows, in particular) treat open
files as locked and private. While you have a file open, no other
program can also open it, even just to read the data. This spoils
backup programs, anti-virus scanners, etc.
http://python.6.x6.nabble.com/Tutor-Why-do-you-have-to-close-files-td4341928.html
https://docs.python.org/2/tutorial/inputoutput.html
Open files use resources and may be locked, preventing other programs from using them. Anyway, it is good practice to use with when reading files, as it takes care of closing the file for you.
with open('file', 'r') as f:
read_data = f.read()
Here's an example of something "bad" that might happen if you leave a file open.
Open a file for writing in your python interpreter, write a string to it, then open that file in a text editor. On my system, the file will be empty until I close the file handle.
The close() method of a file object flushes any unwritten information and closes the file object, after which no more writing can be done.
Python automatically closes a file when the reference object of a file is reassigned to another file. It is a good practice to use the close() method to close a file.Here is the link about the close() method. I hope this helps.
You only have to call close() when you're writing to a file.
Python automatically closes files most of the time, but sometimes it won't, so you want to call it manually just in case.
I had a problem with that recently:
I was writing some stuff to a file in a for-loop, but if I interrupt the script with ^C, a lot of data which should have actually been written to the file wasn't there. It looks like Python stops to writing there for no reason. I opened the file before the for loop. Then I changed the code so that Python opens and closes the file for ever single pass of the loop.
Basically, if you write stuff for your own and you don't have any issues - it's fine, if you write stuff for more people than just yourself - put a close() inside the code, because someone could randomly get an error message and you should try to prevent this.

Python - Tailing a logfile - sleep() versus inotify?

I'm writing a Python script that needs to tail -f a logfile.
The operating system is RHEL, running Linux 2.6.18.
The normal approach I believe is to use an infinite loop with sleep, to continually poll the file.
However, since we're on Linux, I'm thinking I can also use something like pyinotify (https://github.com/seb-m/pyinotify) or Watchdog (https://github.com/gorakhargosh/watchdog) instead?
What are the pros/cons of the this?
I've heard that using sleep(), you can miss events, if the file is growing quickly - is that possible? I thought GNU tail uses sleep as well anyhow?
Cheers,
Victor
The cleanest solution would be inotify in many ways - this is more or less exactly what it's intended for, after all. If the log file was changing extremely rapidly then you could potentially risk being woken up almost constantly, which wouldn't necessarily be particularly efficient - however, you could always mitigate this by adding a short delay of your own after the inotify filehandle returns an event. In practice I doubt this would be an issue on most systems, but I thought it worth mentioning in case your system is very tight on CPU resources.
I can't see how the sleep() approach would miss file updates except in cases where the file is truncated or rotated (i.e. renamed and another file of the same name created). These are tricky cases to handle however you do things, and you can use tricks like periodically re-opening the file by name to check for rotation. Read the tail man page because it handles many such cases, and they're going to be quite common for log files in particular (log rotation being widely considered to be good practice).
The downside of sleep() is of course that you'd end up batching up your reads with delays in between, and also that you have the overhead of constantly waking up and polling the file even when it's not changing. If you did this, say, once per second, however, the overhead probably isn't noticeable on most systems.
I'd say inotify is the best choice unless you want to remain compatible, in which case the simple fallback using sleep() is still quite reasonable.
EDIT:
I just realised I forgot to mention - an easy way to check for a file being renamed is to perform an os.fstat(fd.fileno()) on your open filehandle and a os.stat() on the filename you opened and compare the results. If the os.stat() fails then the error will tell you if the file's been deleted, and if not then comparing the st_ino (the inode number) fields will tell you if the file's been deleted and then replaced with a new one of the same name.
Detecting truncation is harder - effectively your read pointer remains at the same offset in the file and reading will return nothing until the file content size gets back to where you were - then the file will read from that point as normal. If you call os.stat() frequently you could check for the file size going backwards - alternatively you could use fd.tell() to record your current position in the file and then perform an explicit seek to the end of the file and call fd.tell() again. If the value is lower, then the file's been truncated under you. This is a safe operation as long as you keep the original file position around because you can always seek back to it after the check.
Alternatively if you're using inotify anyway, you could just watch the parent directory for changes.
Note that files can be truncated to non-zero sizes, but I doubt that's likely to happen to a log file - the common cases will be being deleted and replaced, or truncated to zero. Also, I don't know how you'd detect the case that the file was truncated and then immediately filled back up to beyond your current position, except by remembering the most recent N characters and comparing them, but that's a pretty grotty thing to do. I think inotify will just tell you the file has been modified in that case.

Python - How to check if a file is used by another application?

I want to open a file which is periodically written to by another application. This application cannot be modified. I'd therefore like to only open the file when I know it is not been written to by an other application.
Is there a pythonic way to do this? Otherwise, how do I achieve this in Unix and Windows?
edit: I'll try and clarify. Is there a way to check if the current file has been opened by another application?
I'd like to start with this question. Whether those other application read/write is irrelevant for now.
I realize it is probably OS dependent, so this may not really be python related right now.
Will your python script desire to open the file for writing or for reading? Is the legacy application opening and closing the file between writes, or does it keep it open?
It is extremely important that we understand what the legacy application is doing, and what your python script is attempting to achieve.
This area of functionality is highly OS-dependent, and the fact that you have no control over the legacy application only makes things harder unfortunately. Whether there is a pythonic or non-pythonic way of doing this will probably be the least of your concerns - the hard question will be whether what you are trying to achieve will be possible at all.
UPDATE
OK, so knowing (from your comment) that:
the legacy application is opening and
closing the file every X minutes, but
I do not want to assume that at t =
t_0 + n*X + eps it already closed
the file.
then the problem's parameters are changed. It can actually be done in an OS-independent way given a few assumptions, or as a combination of OS-dependent and OS-independent techniques. :)
OS-independent way: if it is safe to assume that the legacy application keeps the file open for at most some known quantity of time, say T seconds (e.g. opens the file, performs one write, then closes the file), and re-opens it more or less every X seconds, where X is larger than 2*T.
stat the file
subtract file's modification time from now(), yielding D
if T <= D < X then open the file and do what you need with it
This may be safe enough for your application. Safety increases as T/X decreases. On *nix you may have to double check /etc/ntpd.conf for proper time-stepping vs. slew configuration (see tinker). For Windows see MSDN
Windows: in addition (or in-lieu) of the OS-independent method above, you may attempt to use either:
sharing (locking): this assumes that the legacy program also opens the file in shared mode (usually the default in Windows apps); moreover, if your application acquires the lock just as the legacy application is attempting the same (race condition), the legacy application will fail.
this is extremely intrusive and error prone. Unless both the new application and the legacy application need synchronized access for writing to the same file and you are willing to handle the possibility of the legacy application being denied opening of the file, do not use this method.
attempting to find out what files are open in the legacy application, using the same techniques as ProcessExplorer (the equivalent of *nix's lsof)
you are even more vulnerable to race conditions than the OS-independent technique
Linux/etc.: in addition (or in-lieu) of the OS-independent method above, you may attempt to use the same technique as lsof or, on some systems, simply check which file the symbolic link /proc/<pid>/fd/<fdes> points to
you are even more vulnerable to race conditions than the OS-independent technique
it is highly unlikely that the legacy application uses locking, but if it is, locking is not a real option unless the legacy application can handle a locked file gracefully (by blocking, not by failing - and if your own application can guarantee that the file will not remain locked, blocking the legacy application for extender periods of time.)
UPDATE 2
If favouring the "check whether the legacy application has the file open" (intrusive approach prone to race conditions) then you can solve the said race condition by:
checking whether the legacy application has the file open (a la lsof or ProcessExplorer)
suspending the legacy application process
repeating the check in step 1 to confirm that the legacy application did not open the file between steps 1 and 2; delay and restart at step 1 if so, otherwise proceed to step 4
doing your business on the file -- ideally simply renaming it for subsequent, independent processing in order to keep the legacy application suspended for a minimal amount of time
resuming the legacy application process
Unix does not have file locking as a default. The best suggestion I have for a Unix environment would be to look at the sources for the lsof command. It has deep knowledge about which process have which files open. You could use that as the basis of your solution. Here are the Ubuntu sources for lsof.
One thing I've done is have python very temporarily rename the file. If we're able to rename it, then no other process is using it. I only tested this on Windows.

Categories

Resources