Where to run python file on Remote Debian Sever - python

I have written a python script that is designed to run forever. I load the script into a folder that I made on my remote server which is running debian wheezy 7.0. The code runs , but it will only run for 3 to 4 hours then it just stops, I do not have any log information on it stopping.I come back and check the running process and its not there. Is this a problem in where I am running the python file from? The script simply has a while loop and writes to an external csv file. The file runs from /var/pythonscript. The folder is a custom folder that I made. There is not error that I receive and the only way I know how long the code runs is by the time stamp on the csv file. I run the .py file by ssh to the server and sudo python scriptname.I also would like to know the best place in the linux debian directory to run python files from and limitations concerning that. Any help would be much appreciated.

Basically you're stuffed.
Your problem is:
You have a script, which produces no error messages, no logging, and no other diagnostic information other than a single timestamp, on an output file.
Something has gone wrong.
In this case, you have no means of finding out what the issue was. I suggest any of the following:
either adding logging or diagnostic information to the script.
Contacting the developer of the script and getting them to find a way of determining the issue.
Delete the evidently worthless script if you can't do either option 1, or 2, above, and consider an alternative way of doing your task.
Now, if the script does have logging, or other diagnostic data, but you delete or throw them away, then that's your problem and you need to stop discarding this useful information.
EDIT (following comment).
At a basic level, you should print to either stdout, or to stderr, that alone will give you a huge amount of information. Just things like, "Discovered 314 records, we need to save 240 records", "Opened file name X.csv, Open file succeeded (or failed, as the case may be)", "Error: whatever", "Saved 2315 records to CSV". You should be able to determine if those numbers make sense. (There were 314 records, but it determined 240 of them should be saved, yet it saved 2315? What went wrong!? Time for more logging or investigation!)
Ideally, though, you should take a look at the logging module in python as that will let you log stack traces effectively, show line numbers, the function you're logging in, and the like. Using the logging module allows you to specify logging levels (eg, DEBUG, INFO, WARN, ERROR), and to filter them or redirect them to file or the console, as you may choose, without changing the logging statements themselves.
When you have a problem (crash, or whatever), you'll be able to identify roughly where the error occured, giving you information to either increase the logging in that area, or to be able to reason what must have happened (though you should probably then add enough logging so that the logging will tell you what happened clearly and unambiguously).

Related

How to keep checking a directory in linux system if a log file is created or no? Using python

A feature will be launched which will create logs as the run proceeds. So, I have to write a script which will keep on checking a directory during runtime of the about feature if there are any log files created or no and incase if I see the logs being created, I will do further actions. The tricky part here is I do not have access to watch, cron jobs or anything like that at customer site so any other suggestion would be appreciated. Also, I cannot install any python libraries, so I need something very basic.
I haven't yet tried but looking to see if any function exists, I am planning to use while loop to keep monitoring the directory.
If you cannot convince the client to install inotify-tools in order to have access to inotifywait, then you need to keep track of not only the existence of an output file, but, more important, is the output file closed (is the process finished with that file).
In other words the process creating the file would have a second "flag file" (call it ${process_name}.writing) which would be created before output and removed when output completed.
As for conditional logic, if output.txt exists and ${process_name}.writing does not, then the output.txt is complete and usable.
You can always consider use of the flock utility to test/assign reserved use of file in order to ensure no conflicts in acquisition of "closed" files.

Safe way to view json currently being written by Python code

I have a script I'm running a bunch of times that generates and logs data in json files. These take days to run and I need to run several dozen test cases. I log progress in json files for post-processing. I'd like to check in occasionally to see how long it has left. This is all single thread, but I've dealt with multiprocessing enough to be scared of opening the file while it's being written for fear that viewing it will place a temporary lock on the file.
Is it safe to view the json in a linux terminal using nano log_file.json while my Python scripts are running and could attempt to write to the log at any time?
If it is not safe, are there any alternatives?
I'm worried if Python tries to record an entry that it could be lost or throw an error while I'm viewing progress. Viewing only, no saving obviously. I'd love to check in on progress to switch between test cases faster, but I really don't want to raise an error that loses days of progress if it's unable to write to the json.
Sorry if this is a duplicate, I tried searching but I'm not sure what to even search for this question.
You can use tail command on terminal to view the logs. Following is the full command:-
tail -F <path_to_file>
It will show some of the last lines of the file and continue to show if data is being written in the file.

python WatchedFileHandler still writing to old file after rotation

I've been using WatchedFileHandler as my python logging file handler, so that I can rotate my logs with logrotate (on ubuntu 14.04), which you know is what the docs say its for. My logrotate config files looks like
/path_to_logs/*.log {
daily
rotate 365
size 10M
compress
delaycompress
missingok
notifempty
su root root
}
Everything seemed to be working just fine. I'm using logstash to ship my logs to my elasticsearch cluster and everything is great. I added a second log file for my debug logs which gets rotated but is not watched by logstash. I noticed that when that file is rotated, python just keeps writing to /path_to_debug_logs/*.log.1 and never starts writting to the new file. If I manually tail /path_to_debug_logs/*.log.1, it switches over instantly and starts writing to /path_to_debug_logs/*.log.
This seems REALLY weird to me.
I believe what is happening is that logstash is always tailing my non-debug logs, which some how triggers the switch over to the new file after logrotate is called. If logrotate is called twice without a switch over, the log.1 file gets moved and compressed to log.2.gz, which python can no longer log to and logs are lost.
Clearly there are a bunch of hacky solutions to this (such as a cronjob that tails all my logs every now and then), but I feel like I must be doing something wrong.
I'm using WatchedFileHandler and logrotate instead of RotatingFileHandler for a number of reasons, but mainly because it will nicely compress my logs for me after rotation.
UPDATE:
I tried the horrible hack of adding a manual tail to the end of my log rotation config script.
sharedscripts
postrotate
/usr/bin/tail -n 1 path_to_logs/*.log.1
endscript
Sure enough this works most of the time, but randomly fails sometimes for no clear reason, so isn't a solution. I've also tried a number of less hacky solutions where I've modified the way WatchFileHandler checks if the file has changed, but no luck.
I'm fairly sure the root of my problem is that the logs are stored on a network drive, which is somehow confusing the file system.
I'm moving my rotation to python with RotatingFileHandler, but if anyone knows the proper way to handle this I'd love to know.
Use copytruncate option of logrotate. From docs
copytruncate
Truncate the original log file in place after creating a copy, instead of moving the old log file and optionally creating a new one, It can be used when some program can not be told to close its logfile and thus might continue writing (appending) to the previous log file forever. Note that there is a very small time slice between copying the file and truncating it, so some logging data might be lost. When this option is used, the create option will have no effect, as the old log file stays in place.
WatchedFileHandler does a rollover when a device and/or inode change is detected in the log file just before writing to it. Perhaps the file which isn't being watched by logstash doesn't see a change in its device/inode? That would explain why the handler keeps on writing to it.

Python fabric put statistics

When I put a file on a remote server (using put()), is there anyway I can see the upload information or statistics printed to the stdout file descriptor?
There's no such way according to the documentation. You could however try the project tools.
There's also the option to play with fabric's local function, but of course breaks the whole host concept.
There's also no way to make fabric more verbose than the default (except for debugging). This makes sense because fabric doesn't really work with terminal escape keys to delete lines again. Displaying statistics would print way to many lines. This would actually be a nice feature - detecting line deletions within fabric and applying them (just throwing the idea out for a potential pull request).

In python, why use logging instead of print?

For simple debugging in a complex project is there a reason to use the python logger instead of print? What about other use-cases? Is there an accepted best use-case for each (especially when you're only looking for stdout)?
I've always heard that this is a "best practice" but I haven't been able to figure out why.
The logging package has a lot of useful features:
Easy to see where and when (even what line no.) a logging call is being made from.
You can log to files, sockets, pretty much anything, all at the same time.
You can differentiate your logging based on severity.
Print doesn't have any of these.
Also, if your project is meant to be imported by other python tools, it's bad practice for your package to print things to stdout, since the user likely won't know where the print messages are coming from. With logging, users of your package can choose whether or not they want to propogate logging messages from your tool or not.
One of the biggest advantages of proper logging is that you can categorize messages and turn them on or off depending on what you need. For example, it might be useful to turn on debugging level messages for a certain part of the project, but tone it down for other parts, so as not to be taken over by information overload and to easily concentrate on the task for which you need logging.
Also, logs are configurable. You can easily filter them, send them to files, format them, add timestamps, and any other things you might need on a global basis. Print statements are not easily managed.
Print statements are sort of the worst of both worlds, combining the negative aspects of an online debugger with diagnostic instrumentation. You have to modify the program but you don't get more, useful code from it.
An online debugger allows you to inspect the state of a running program; But the nice thing about a real debugger is that you don't have to modify the source; neither before nor after the debugging session; You just load the program into the debugger, tell the debugger where you want to look, and you're all set.
Instrumenting the application might take some work up front, modifying the source code in some way, but the resulting diagnostic output can have enormous amounts of detail, and can be turned on or off to a very specific degree. The python logging module can show not just the message logged, but also the file and function that called it, a traceback if there was one, the actual time that the message was emitted, and so on. More than that; diagnostic instrumentation need never be removed; It's just as valid and useful when the program is finished and in production as it was the day it was added; but it can have it's output stuck in a log file where it's not likely to annoy anyone, or the log level can be turned down to keep all but the most urgent messages out.
anticipating the need or use for a debugger is really no harder than using ipython while you're testing, and becoming familiar with the commands it uses to control the built in pdb debugger.
When you find yourself thinking that a print statement might be easier than using pdb (as it often is), You'll find that using a logger pulls your program in a much easier to work on state than if you use and later remove print statements.
I have my editor configured to highlight print statements as syntax errors, and logging statements as comments, since that's about how I regard them.
In brief, the advantages of using logging libraries do outweigh print as below reasons:
Control what’s emitted
Define what types of information you want to include in your logs
Configure how it looks when it’s emitted
Most importantly, set the destination for your logs
In detail, segmenting log events by severity level is a good way to sift through which log messages may be most relevant at a given time. A log event’s severity level also gives you an indication of how worried you should be when you see a particular message. For instance, dividing logging type to debug, info, warning, critical, and error. Timing can be everything when you’re trying to understand what went wrong with an application. You want to know the answers to questions like:
“Was this happening before or after my database connection died?”
“Exactly when did that request come in?”
Furthermore, it is easy to see where a log has occurred through line number and filename or method name even in which thread.
Here's a functional logging library for Python named loguru.
If you use logging then the person responsible for deployment can configure the logger to send it to a custom location, with custom information. If you only print, then that's all they get.
Logging essentially creates a searchable plain text database of print outputs with other meta data (timestamp, loglevel, line number, process etc.).
This is pure gold, I can run egrep over the log file after the python script has run.
I can tune my egrep pattern search to pick exactly what I am interested in and ignore the rest. This reduction of cognitive load and freedom to pick my egrep pattern later on by trial and error is the key benefit for me.
tail -f mylogfile.log | egrep "key_word1|key_word2"
Now throw in other cool things that print can't do (sending to socket, setting debug levels, logrotate, adding meta data etc.), you have every reason to prefer logging over plain print statements.
I tend to use print statements because it's lazy and easy, adding logging needs some boiler plate code, hey we have yasnippets (emacs) and ultisnips (vim) and other templating tools, so why give up logging for plain print statements!?
I would add to all other mentionned advantages that the print function in standard configuration is buffered. The flush may occure only at the end of the current block (the one where the print is).
This is true for any program launched in a non interactive shell (codebuild, gitlab-ci for instance) or whose output is redirected.
If for any reason the program is killed (kill -9, hard reset of the computer, …), you may be missing some line of logs if you used print for the same.
However, the logging library will ensure to flush the logs printed to stderr and stdout immediately at any call.

Categories

Resources