Dynamic/changing last line in log file - python

I have a logger, where I append a line for each downloaded file, because I need to monitor that.
But then I end up with log full of these. I would like a solution where when downloading 50.000 files from server, the last line would just change the count of the downloads finished and last file downloaded, like this:
[timestamp] Started downloading 50 000 files.
[timestamp] Downloaded 1002th file - filename.csv
[timestamp] <Error downloading this file> #show only when err ofc
[timestamp] Download finished.
This is not a terminal log, it is a log file, which I read actively with tail -f.
How can I make the line Downloaded 1002th file - filename.csv dynamic?

The easiest solution would be to write whole file at once, after each download is complete, and truncate it before each such write. Otherwise you would have to work on rather low level, using
seek and tell function of python file object, which would be rather overkill (https://docs.python.org/3/tutorial/inputoutput.html) to just save few lines. In any solution such changes on file may not work properly with tail -f (because if file size does not change, tail may not update position in file; moreover if you reopen file in python, file descriptor will change and you may have to use tail -F ). Maybe it would be enough to use watch cat?

Attempting to modify the log file is:
Very hard, as you're trying to modify it while continuously writing it from Python:
If you'll do it from an external program, you'll have 2 writers on the same file section and it can cause big issues.
If you'll do it from Python, you won't actually be able to use the logging module, as you'd need to start creating custom file handlers and flags.
You'll cause issues with tail -F actually reading this.
Discouraged. The log is a log, you shouldn't go in random sections and modify them.
If you wish to easily monitor this, you have multiple different solutions:
Write the "successfully downloaded file" using logging.debug instead of logging.info. Then monitor on logging.INFO level. I believe this is the best course of action. You can even write the debug to one log and info to another, and monitor the info.
Send a "successfully downloaded 10/100/1000 files". That'll batch the logging.info rows.
Use any type of external monitoring. More of a custom solution, a bit out of scope for the question.

Related

How to make sure a file is completed before copying it?

An application A (out of my control) writes a file into a directory.
After the file is written I want to back it up somewhere else with a python script of mine.
Question: how may I be sure that the file is completed or that instead the application A is still writing the file so that I should wait until its completion? I am worried I could copy a partial file....
I wanted to use this function shutil.copyfile(src,dst) but I don't know if it is safe or I should check the file to copy in some other way.
In general, you can't.
Because you don't have the information needed to solve the problem.
If you have to know that a file was completely transferred/created/written/whatever successfully, the creator has to send you a signal somehow, because only the creator has that information. From the receiving side, there's in general no way to infer that a file has been completely transferred. You can try to guess, but that's all it is. You can't in general tell a complete transfer from one where the connection was lost, for example.
So you need a signal of some sort from the sender.
One common way is to use a rename operation from something like filename.xfr to filename, or from an "in work" directory to the one you're watching. Since most operating systems implement such rename operations atomically, if the sender only does the rename when the transfer is successfully done, you'll only process complete files that have been successfully transferred.
Another common signal is to send a "done" flag file, such as sending filename.done once filename has been successfully sent.
Since you don't control the sender, you can't reliably solve this problem by watching for files.

python WatchedFileHandler still writing to old file after rotation

I've been using WatchedFileHandler as my python logging file handler, so that I can rotate my logs with logrotate (on ubuntu 14.04), which you know is what the docs say its for. My logrotate config files looks like
/path_to_logs/*.log {
daily
rotate 365
size 10M
compress
delaycompress
missingok
notifempty
su root root
}
Everything seemed to be working just fine. I'm using logstash to ship my logs to my elasticsearch cluster and everything is great. I added a second log file for my debug logs which gets rotated but is not watched by logstash. I noticed that when that file is rotated, python just keeps writing to /path_to_debug_logs/*.log.1 and never starts writting to the new file. If I manually tail /path_to_debug_logs/*.log.1, it switches over instantly and starts writing to /path_to_debug_logs/*.log.
This seems REALLY weird to me.
I believe what is happening is that logstash is always tailing my non-debug logs, which some how triggers the switch over to the new file after logrotate is called. If logrotate is called twice without a switch over, the log.1 file gets moved and compressed to log.2.gz, which python can no longer log to and logs are lost.
Clearly there are a bunch of hacky solutions to this (such as a cronjob that tails all my logs every now and then), but I feel like I must be doing something wrong.
I'm using WatchedFileHandler and logrotate instead of RotatingFileHandler for a number of reasons, but mainly because it will nicely compress my logs for me after rotation.
UPDATE:
I tried the horrible hack of adding a manual tail to the end of my log rotation config script.
sharedscripts
postrotate
/usr/bin/tail -n 1 path_to_logs/*.log.1
endscript
Sure enough this works most of the time, but randomly fails sometimes for no clear reason, so isn't a solution. I've also tried a number of less hacky solutions where I've modified the way WatchFileHandler checks if the file has changed, but no luck.
I'm fairly sure the root of my problem is that the logs are stored on a network drive, which is somehow confusing the file system.
I'm moving my rotation to python with RotatingFileHandler, but if anyone knows the proper way to handle this I'd love to know.
Use copytruncate option of logrotate. From docs
copytruncate
Truncate the original log file in place after creating a copy, instead of moving the old log file and optionally creating a new one, It can be used when some program can not be told to close its logfile and thus might continue writing (appending) to the previous log file forever. Note that there is a very small time slice between copying the file and truncating it, so some logging data might be lost. When this option is used, the create option will have no effect, as the old log file stays in place.
WatchedFileHandler does a rollover when a device and/or inode change is detected in the log file just before writing to it. Perhaps the file which isn't being watched by logstash doesn't see a change in its device/inode? That would explain why the handler keeps on writing to it.

Can I save a text file in python without closing it?

I am writing a program in which I would like to be able to view a log file before the program is complete. I have noticed that, in python (2.7 and 3), that file.write() does not save the file, file.close() does. I don't want to create a million little log files with unique names but I would like to be able to view the updated log file before the program is finished. How can I do this?
Now, to be clear I am scripting using Ansys Workbench (trying to batch some CFX runs). Here's a link to a tutorial that shows what I'm talking about. They appear to have wrapped python, and by running the script I can send commands to the various modules. When the script is running there is no console onscreen and it appears to be eating all of the print statements, so the only way I can report what's happening is via a file. Also, I don't want to bring a console window up because eventually I will just run the program in batch mode (no interface). But the simulations take a long time to run and I can't wait for the program to finish before checking on what's happening.
You would need this:
file.flush()
# typically the above line would do. however this is used to ensure that the file is written
os.fsync(file.fileno())
Check this: http://docs.python.org/2/library/stdtypes.html#file.flush
file.flush()
Flush the internal buffer, like stdio‘s fflush(). This may be a no-op on some file-like objects.
Note flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.
EDITED: See this question for detailed explanations: what exactly the python's file.flush() is doing?
Does file.flush() after each write help?
Hannu
This will write the file to disk immediately:
file.flush()
os.fsync(file.fileno())
According to the documentation https://docs.python.org/2/library/os.html#os.fsync
Force write of file with filedescriptor fd to disk. On Unix, this calls the native fsync() function; on Windows, the MS _commit() function.
If you’re starting with a Python file object f, first do f.flush(), and then do os.fsync(f.fileno()), to ensure that all internal buffers associated with f are written to disk.

python, tailer and logrotate

I use tailer for parse logs in python but it's broken if Log Rotation on server. What decision can be used instead? tail -f in popen - it is not Pythonic way
It shouldn't be that difficult to add log rotation functionality. For example, if you have:
for line in tailer.follow(open('test.txt')):
print line
You could add a callback to a function that periodically checked for the existence of the next filename. If it exists, break out of the loop and begin one on the new file.
Upon logrotate event next thing will happen:
log file's inode wasn't changed but log was renamed to new name (e.g. log.out.1)
logrotate creates new file with same name (log.out) (I'm not sure :-)
tailer module will still look at old file inode.
You have to monitor log file's inode value for correct log following.
This is what is 'tail -F' for.
As you can see in its source, tailer module hasn't been designed to following logrotation of source file: it based on recipe
http://code.activestate.com/recipes/157035/
and is unuseful for your task.
Please take a look at comments for source recipe.
--
P.S. or use my one, which is wrapper around 'tail -f' :)
http://code.activestate.com/recipes/577398-tail-f-with-inode-monitor/

Check if the directory content has changed with shell script or python

I have a program that create files in a specific directory.
When those files are ready, I run Latex to produce a .pdf file.
So, my question is, how can I use this directory change as a trigger
to call Latex, using a shell script or a python script?
Best Regards
inotify replaces dnotify.
Why?
...dnotify requires opening one file descriptor for each directory that you intend to watch for changes...
Additionally, the file descriptor pins the directory, disallowing the backing device to be unmounted, which causes problems in scenarios involving removable media. When using inotify, if you are watching a file on a file system that is unmounted, the watch is automatically removed and you receive an unmount event.
...and more.
More Why?
Unlike its ancestor dnotify, inotify doesn't complicate your work by various limitations. For example, if you watch files on a removable media these file aren't locked. In comparison with it, dnotify requires the files themselves to be open and thus really "locks" them (hampers unmounting the media).
Reference
Is dnotify what you need?
Make on unix systems is usually used to track by date what needs rebuilding when files have changed. I normally use a rather good makefile for this job. There seems to be another alternative around on google code too
You not only need to check for changes, but need to know that all changes are complete before running LaTeX. For example, if you start LaTeX after the first file has been modified and while more changes are still pending, you'll be using partial data and have to re-run later.
Wait for your first program to complete:
#!/bin/bash
first-program &&
run-after-changes-complete
Using && means the second command is only executed if the first completes successfully (a zero exit code). Because this simple script will always run the second command even if the first doesn't change any files, you can incorporate this into whatever build system you are already familiar with, such as make.
Python FAM is a Python interface for FAM (File Alteration Monitor)
You can also have a look at Pyinotify, which is a module for monitoring file system changes.
Not much of a python man myself. But in a pinch, assuming you're on linux, you could periodically shell out and "ls -lrt /path/to/directory" (get the directory contents and sort by last modified), and compare the results of the last two calls for a difference. If so, then there was a change. Not very detailed, but gets the job done.
You can use native python module hashlib which implements MD5 algorithm:
>>> import hashlib
>>> import os
>>> m = hashlib.md5()
>>> for root, dirs, files in os.walk(path):
for file_read in files:
full_path = os.path.join(root, file_read)
for line in open(full_path).readlines():
m.update(line)
>>> m.digest()
'pQ\x1b\xb9oC\x9bl\xea\xbf\x1d\xda\x16\xfe8\xcf'
You can save this result in a file or a variable, and compare it to the result of the next run. This will detect changes in any files, in any sub-directory.
This does not take into account file permission changes; if you need to monitor these change as well, this could be addressed via appending a string representing the permissions (accessible via os.stat for instance, attributes depend on your system) to the mvariable.

Categories

Resources