python WatchedFileHandler still writing to old file after rotation

python WatchedFileHandler still writing to old file after rotation - python

I've been using WatchedFileHandler as my python logging file handler, so that I can rotate my logs with logrotate (on ubuntu 14.04), which you know is what the docs say its for. My logrotate config files looks like
/path_to_logs/*.log {
daily
rotate 365
size 10M
compress
delaycompress
missingok
notifempty
su root root
}
Everything seemed to be working just fine. I'm using logstash to ship my logs to my elasticsearch cluster and everything is great. I added a second log file for my debug logs which gets rotated but is not watched by logstash. I noticed that when that file is rotated, python just keeps writing to /path_to_debug_logs/*.log.1 and never starts writting to the new file. If I manually tail /path_to_debug_logs/*.log.1, it switches over instantly and starts writing to /path_to_debug_logs/*.log.
This seems REALLY weird to me.
I believe what is happening is that logstash is always tailing my non-debug logs, which some how triggers the switch over to the new file after logrotate is called. If logrotate is called twice without a switch over, the log.1 file gets moved and compressed to log.2.gz, which python can no longer log to and logs are lost.
Clearly there are a bunch of hacky solutions to this (such as a cronjob that tails all my logs every now and then), but I feel like I must be doing something wrong.
I'm using WatchedFileHandler and logrotate instead of RotatingFileHandler for a number of reasons, but mainly because it will nicely compress my logs for me after rotation.
UPDATE:
I tried the horrible hack of adding a manual tail to the end of my log rotation config script.
sharedscripts
postrotate
/usr/bin/tail -n 1 path_to_logs/*.log.1
endscript
Sure enough this works most of the time, but randomly fails sometimes for no clear reason, so isn't a solution. I've also tried a number of less hacky solutions where I've modified the way WatchFileHandler checks if the file has changed, but no luck.
I'm fairly sure the root of my problem is that the logs are stored on a network drive, which is somehow confusing the file system.
I'm moving my rotation to python with RotatingFileHandler, but if anyone knows the proper way to handle this I'd love to know.

Use copytruncate option of logrotate. From docs
copytruncate
Truncate the original log file in place after creating a copy, instead of moving the old log file and optionally creating a new one, It can be used when some program can not be told to close its logfile and thus might continue writing (appending) to the previous log file forever. Note that there is a very small time slice between copying the file and truncating it, so some logging data might be lost. When this option is used, the create option will have no effect, as the old log file stays in place.

WatchedFileHandler does a rollover when a device and/or inode change is detected in the log file just before writing to it. Perhaps the file which isn't being watched by logstash doesn't see a change in its device/inode? That would explain why the handler keeps on writing to it.

Related

Dynamic/changing last line in log file

I have a logger, where I append a line for each downloaded file, because I need to monitor that.
But then I end up with log full of these. I would like a solution where when downloading 50.000 files from server, the last line would just change the count of the downloads finished and last file downloaded, like this:
[timestamp] Started downloading 50 000 files.
[timestamp] Downloaded 1002th file - filename.csv
[timestamp] <Error downloading this file> #show only when err ofc
[timestamp] Download finished.
This is not a terminal log, it is a log file, which I read actively with tail -f.
How can I make the line Downloaded 1002th file - filename.csv dynamic?

The easiest solution would be to write whole file at once, after each download is complete, and truncate it before each such write. Otherwise you would have to work on rather low level, using
seek and tell function of python file object, which would be rather overkill (https://docs.python.org/3/tutorial/inputoutput.html) to just save few lines. In any solution such changes on file may not work properly with tail -f (because if file size does not change, tail may not update position in file; moreover if you reopen file in python, file descriptor will change and you may have to use tail -F ). Maybe it would be enough to use watch cat?

Attempting to modify the log file is:
Very hard, as you're trying to modify it while continuously writing it from Python:
If you'll do it from an external program, you'll have 2 writers on the same file section and it can cause big issues.
If you'll do it from Python, you won't actually be able to use the logging module, as you'd need to start creating custom file handlers and flags.
You'll cause issues with tail -F actually reading this.
Discouraged. The log is a log, you shouldn't go in random sections and modify them.
If you wish to easily monitor this, you have multiple different solutions:
Write the "successfully downloaded file" using logging.debug instead of logging.info. Then monitor on logging.INFO level. I believe this is the best course of action. You can even write the debug to one log and info to another, and monitor the info.
Send a "successfully downloaded 10/100/1000 files". That'll batch the logging.info rows.
Use any type of external monitoring. More of a custom solution, a bit out of scope for the question.

Safe way to view json currently being written by Python code

I have a script I'm running a bunch of times that generates and logs data in json files. These take days to run and I need to run several dozen test cases. I log progress in json files for post-processing. I'd like to check in occasionally to see how long it has left. This is all single thread, but I've dealt with multiprocessing enough to be scared of opening the file while it's being written for fear that viewing it will place a temporary lock on the file.
Is it safe to view the json in a linux terminal using nano log_file.json while my Python scripts are running and could attempt to write to the log at any time?
If it is not safe, are there any alternatives?
I'm worried if Python tries to record an entry that it could be lost or throw an error while I'm viewing progress. Viewing only, no saving obviously. I'd love to check in on progress to switch between test cases faster, but I really don't want to raise an error that loses days of progress if it's unable to write to the json.
Sorry if this is a duplicate, I tried searching but I'm not sure what to even search for this question.

You can use tail command on terminal to view the logs. Following is the full command:-
tail -F <path_to_file>
It will show some of the last lines of the file and continue to show if data is being written in the file.

How to make sure a file is completed before copying it?

An application A (out of my control) writes a file into a directory.
After the file is written I want to back it up somewhere else with a python script of mine.
Question: how may I be sure that the file is completed or that instead the application A is still writing the file so that I should wait until its completion? I am worried I could copy a partial file....
I wanted to use this function shutil.copyfile(src,dst) but I don't know if it is safe or I should check the file to copy in some other way.

In general, you can't.
Because you don't have the information needed to solve the problem.
If you have to know that a file was completely transferred/created/written/whatever successfully, the creator has to send you a signal somehow, because only the creator has that information. From the receiving side, there's in general no way to infer that a file has been completely transferred. You can try to guess, but that's all it is. You can't in general tell a complete transfer from one where the connection was lost, for example.
So you need a signal of some sort from the sender.
One common way is to use a rename operation from something like filename.xfr to filename, or from an "in work" directory to the one you're watching. Since most operating systems implement such rename operations atomically, if the sender only does the rename when the transfer is successfully done, you'll only process complete files that have been successfully transferred.
Another common signal is to send a "done" flag file, such as sending filename.done once filename has been successfully sent.
Since you don't control the sender, you can't reliably solve this problem by watching for files.

Where to run python file on Remote Debian Sever

I have written a python script that is designed to run forever. I load the script into a folder that I made on my remote server which is running debian wheezy 7.0. The code runs , but it will only run for 3 to 4 hours then it just stops, I do not have any log information on it stopping.I come back and check the running process and its not there. Is this a problem in where I am running the python file from? The script simply has a while loop and writes to an external csv file. The file runs from /var/pythonscript. The folder is a custom folder that I made. There is not error that I receive and the only way I know how long the code runs is by the time stamp on the csv file. I run the .py file by ssh to the server and sudo python scriptname.I also would like to know the best place in the linux debian directory to run python files from and limitations concerning that. Any help would be much appreciated.

Basically you're stuffed.
Your problem is:
You have a script, which produces no error messages, no logging, and no other diagnostic information other than a single timestamp, on an output file.
Something has gone wrong.
In this case, you have no means of finding out what the issue was. I suggest any of the following:
either adding logging or diagnostic information to the script.
Contacting the developer of the script and getting them to find a way of determining the issue.
Delete the evidently worthless script if you can't do either option 1, or 2, above, and consider an alternative way of doing your task.
Now, if the script does have logging, or other diagnostic data, but you delete or throw them away, then that's your problem and you need to stop discarding this useful information.
EDIT (following comment).
At a basic level, you should print to either stdout, or to stderr, that alone will give you a huge amount of information. Just things like, "Discovered 314 records, we need to save 240 records", "Opened file name X.csv, Open file succeeded (or failed, as the case may be)", "Error: whatever", "Saved 2315 records to CSV". You should be able to determine if those numbers make sense. (There were 314 records, but it determined 240 of them should be saved, yet it saved 2315? What went wrong!? Time for more logging or investigation!)
Ideally, though, you should take a look at the logging module in python as that will let you log stack traces effectively, show line numbers, the function you're logging in, and the like. Using the logging module allows you to specify logging levels (eg, DEBUG, INFO, WARN, ERROR), and to filter them or redirect them to file or the console, as you may choose, without changing the logging statements themselves.
When you have a problem (crash, or whatever), you'll be able to identify roughly where the error occured, giving you information to either increase the logging in that area, or to be able to reason what must have happened (though you should probably then add enough logging so that the logging will tell you what happened clearly and unambiguously).

Back end process in windows

I need to run the python program in the backend. To the script I have given one input file and the code is processing that file and creating new output file. Now if I change the input file content I don't want to run the code again. It should run in the back end continously and generate the output file. Please if someone knows the answer for this let me know.
thank you

Basically, you have to set up a so-called FileWatcher, i.e. some mechanism which looks out for changes in a file.
There are several techniques for watching file/directory changes in python. Have a look at this question: Monitoring contents of files/directories?. Another link is here, this is about directory changes but file changes are handled in a similar way. You could also google for "watch file changes python" in order to get a lot of answers :)
Note: If you're programming in windows, you should probably implement your program as windows service, look here for how to do that.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.