I use tailer for parse logs in python but it's broken if Log Rotation on server. What decision can be used instead? tail -f in popen - it is not Pythonic way
It shouldn't be that difficult to add log rotation functionality. For example, if you have:
for line in tailer.follow(open('test.txt')):
print line
You could add a callback to a function that periodically checked for the existence of the next filename. If it exists, break out of the loop and begin one on the new file.
Upon logrotate event next thing will happen:
log file's inode wasn't changed but log was renamed to new name (e.g. log.out.1)
logrotate creates new file with same name (log.out) (I'm not sure :-)
tailer module will still look at old file inode.
You have to monitor log file's inode value for correct log following.
This is what is 'tail -F' for.
As you can see in its source, tailer module hasn't been designed to following logrotation of source file: it based on recipe
http://code.activestate.com/recipes/157035/
and is unuseful for your task.
Please take a look at comments for source recipe.
--
P.S. or use my one, which is wrapper around 'tail -f' :)
http://code.activestate.com/recipes/577398-tail-f-with-inode-monitor/
Related
A feature will be launched which will create logs as the run proceeds. So, I have to write a script which will keep on checking a directory during runtime of the about feature if there are any log files created or no and incase if I see the logs being created, I will do further actions. The tricky part here is I do not have access to watch, cron jobs or anything like that at customer site so any other suggestion would be appreciated. Also, I cannot install any python libraries, so I need something very basic.
I haven't yet tried but looking to see if any function exists, I am planning to use while loop to keep monitoring the directory.
If you cannot convince the client to install inotify-tools in order to have access to inotifywait, then you need to keep track of not only the existence of an output file, but, more important, is the output file closed (is the process finished with that file).
In other words the process creating the file would have a second "flag file" (call it ${process_name}.writing) which would be created before output and removed when output completed.
As for conditional logic, if output.txt exists and ${process_name}.writing does not, then the output.txt is complete and usable.
You can always consider use of the flock utility to test/assign reserved use of file in order to ensure no conflicts in acquisition of "closed" files.
I have a logger, where I append a line for each downloaded file, because I need to monitor that.
But then I end up with log full of these. I would like a solution where when downloading 50.000 files from server, the last line would just change the count of the downloads finished and last file downloaded, like this:
[timestamp] Started downloading 50 000 files.
[timestamp] Downloaded 1002th file - filename.csv
[timestamp] <Error downloading this file> #show only when err ofc
[timestamp] Download finished.
This is not a terminal log, it is a log file, which I read actively with tail -f.
How can I make the line Downloaded 1002th file - filename.csv dynamic?
The easiest solution would be to write whole file at once, after each download is complete, and truncate it before each such write. Otherwise you would have to work on rather low level, using
seek and tell function of python file object, which would be rather overkill (https://docs.python.org/3/tutorial/inputoutput.html) to just save few lines. In any solution such changes on file may not work properly with tail -f (because if file size does not change, tail may not update position in file; moreover if you reopen file in python, file descriptor will change and you may have to use tail -F ). Maybe it would be enough to use watch cat?
Attempting to modify the log file is:
Very hard, as you're trying to modify it while continuously writing it from Python:
If you'll do it from an external program, you'll have 2 writers on the same file section and it can cause big issues.
If you'll do it from Python, you won't actually be able to use the logging module, as you'd need to start creating custom file handlers and flags.
You'll cause issues with tail -F actually reading this.
Discouraged. The log is a log, you shouldn't go in random sections and modify them.
If you wish to easily monitor this, you have multiple different solutions:
Write the "successfully downloaded file" using logging.debug instead of logging.info. Then monitor on logging.INFO level. I believe this is the best course of action. You can even write the debug to one log and info to another, and monitor the info.
Send a "successfully downloaded 10/100/1000 files". That'll batch the logging.info rows.
Use any type of external monitoring. More of a custom solution, a bit out of scope for the question.
I've been using WatchedFileHandler as my python logging file handler, so that I can rotate my logs with logrotate (on ubuntu 14.04), which you know is what the docs say its for. My logrotate config files looks like
/path_to_logs/*.log {
daily
rotate 365
size 10M
compress
delaycompress
missingok
notifempty
su root root
}
Everything seemed to be working just fine. I'm using logstash to ship my logs to my elasticsearch cluster and everything is great. I added a second log file for my debug logs which gets rotated but is not watched by logstash. I noticed that when that file is rotated, python just keeps writing to /path_to_debug_logs/*.log.1 and never starts writting to the new file. If I manually tail /path_to_debug_logs/*.log.1, it switches over instantly and starts writing to /path_to_debug_logs/*.log.
This seems REALLY weird to me.
I believe what is happening is that logstash is always tailing my non-debug logs, which some how triggers the switch over to the new file after logrotate is called. If logrotate is called twice without a switch over, the log.1 file gets moved and compressed to log.2.gz, which python can no longer log to and logs are lost.
Clearly there are a bunch of hacky solutions to this (such as a cronjob that tails all my logs every now and then), but I feel like I must be doing something wrong.
I'm using WatchedFileHandler and logrotate instead of RotatingFileHandler for a number of reasons, but mainly because it will nicely compress my logs for me after rotation.
UPDATE:
I tried the horrible hack of adding a manual tail to the end of my log rotation config script.
sharedscripts
postrotate
/usr/bin/tail -n 1 path_to_logs/*.log.1
endscript
Sure enough this works most of the time, but randomly fails sometimes for no clear reason, so isn't a solution. I've also tried a number of less hacky solutions where I've modified the way WatchFileHandler checks if the file has changed, but no luck.
I'm fairly sure the root of my problem is that the logs are stored on a network drive, which is somehow confusing the file system.
I'm moving my rotation to python with RotatingFileHandler, but if anyone knows the proper way to handle this I'd love to know.
Use copytruncate option of logrotate. From docs
copytruncate
Truncate the original log file in place after creating a copy, instead of moving the old log file and optionally creating a new one, It can be used when some program can not be told to close its logfile and thus might continue writing (appending) to the previous log file forever. Note that there is a very small time slice between copying the file and truncating it, so some logging data might be lost. When this option is used, the create option will have no effect, as the old log file stays in place.
WatchedFileHandler does a rollover when a device and/or inode change is detected in the log file just before writing to it. Perhaps the file which isn't being watched by logstash doesn't see a change in its device/inode? That would explain why the handler keeps on writing to it.
I have a program that create files in a specific directory.
When those files are ready, I run Latex to produce a .pdf file.
So, my question is, how can I use this directory change as a trigger
to call Latex, using a shell script or a python script?
Best Regards
inotify replaces dnotify.
Why?
...dnotify requires opening one file descriptor for each directory that you intend to watch for changes...
Additionally, the file descriptor pins the directory, disallowing the backing device to be unmounted, which causes problems in scenarios involving removable media. When using inotify, if you are watching a file on a file system that is unmounted, the watch is automatically removed and you receive an unmount event.
...and more.
More Why?
Unlike its ancestor dnotify, inotify doesn't complicate your work by various limitations. For example, if you watch files on a removable media these file aren't locked. In comparison with it, dnotify requires the files themselves to be open and thus really "locks" them (hampers unmounting the media).
Reference
Is dnotify what you need?
Make on unix systems is usually used to track by date what needs rebuilding when files have changed. I normally use a rather good makefile for this job. There seems to be another alternative around on google code too
You not only need to check for changes, but need to know that all changes are complete before running LaTeX. For example, if you start LaTeX after the first file has been modified and while more changes are still pending, you'll be using partial data and have to re-run later.
Wait for your first program to complete:
#!/bin/bash
first-program &&
run-after-changes-complete
Using && means the second command is only executed if the first completes successfully (a zero exit code). Because this simple script will always run the second command even if the first doesn't change any files, you can incorporate this into whatever build system you are already familiar with, such as make.
Python FAM is a Python interface for FAM (File Alteration Monitor)
You can also have a look at Pyinotify, which is a module for monitoring file system changes.
Not much of a python man myself. But in a pinch, assuming you're on linux, you could periodically shell out and "ls -lrt /path/to/directory" (get the directory contents and sort by last modified), and compare the results of the last two calls for a difference. If so, then there was a change. Not very detailed, but gets the job done.
You can use native python module hashlib which implements MD5 algorithm:
>>> import hashlib
>>> import os
>>> m = hashlib.md5()
>>> for root, dirs, files in os.walk(path):
for file_read in files:
full_path = os.path.join(root, file_read)
for line in open(full_path).readlines():
m.update(line)
>>> m.digest()
'pQ\x1b\xb9oC\x9bl\xea\xbf\x1d\xda\x16\xfe8\xcf'
You can save this result in a file or a variable, and compare it to the result of the next run. This will detect changes in any files, in any sub-directory.
This does not take into account file permission changes; if you need to monitor these change as well, this could be addressed via appending a string representing the permissions (accessible via os.stat for instance, attributes depend on your system) to the mvariable.
I'm trying to use scons to build a latex document. In particular, I want to get scons to invoke a python program that generates a file containing a table that is \input{} into the main document. I've looked over the scons documentation but it is not immediately clear to me what I need to do.
What I wish to achieve is essentially what you would get with this makefile:
document.pdf: table.tex
pdflatex document.tex
table.tex:
python table_generator.py
How can I express this in scons?
Something along these lines should do -
env.Command ('document.tex', '', 'python table_generator.py')
env.PDF ('document.pdf', 'document.tex')
It declares that 'document.tex' is generated by calling the Python script, and requests a PDF document to be created from this generatd 'document.tex' file.
Note that this is in spirit only. It may require some tweaking. In particular, I'm not certain what kind of semantics you would want for the generation of 'document.tex' - should it be generated every time? Only when it doesn't exist? When some other file changes? (you would want to add this dependency as the second argument to Command() that case).
In addition, the output of Command() can be used as input to PDF() if desired. For clarity, I didn't do that.
In this simple case, the easiest way is to just use the subprocess module
from subprocess import call
call("python table_generator.py")
call("pdflatex document.tex")
Regardless of where in your SConstruct file these lines are placed, they will happen before any of the compiling and linking performed by SCons.
The downside is that these commands will be executed every time you run SCons, rather than only when the files have changed, which is what would happen in your example Makefile. So if those commands take a long time to run, this wouldn't be a good solution.
If you really need to only run these commands when the files have changed, look at the SCons manual section Writing Your Own Builders.