I write a python script in which there are several print statement. The printed information can help me to monitor the progress of the script. But when I qsub the bash script, which contains python my_script &> output, onto computing nodes, the output file contains nothing even when the script is running and printing something. The output file will contains the output when the script is done. So how can I get the output in real time through the output file when the script is running.
Actually write to the file rather than piping and flush after each write or after each write call sys.stdout.flush() but you are better off using a logger function and replacing the prints with logs.
From Comments:
A logger function is one that you call instead of print that will output to somewhere the text, possibly timestamped and with other information, they usually let you output various amounts of information to various destinations including stdout and files. See python 2 or 3 documents for information on pythons built in logging function.
I like to write data to sys.stderr sometimes for this sort of thing. It obviates the need to flush so much. But if you're generating output for piping sometimes, you remain better off with sys.stdout.
Related
Currently I am redirecting a script to a log file with the following command:
python /usr/home/scripts/myscript.py 2>&1 | tee /usr/home/logs/mylogfile.log
This seems to work but it does not write to the file as soon as there is a print command. Rather it waits until there is a group of lines that it can print. I want the console and the log file to be written to simultaneously. How can this be done with output redirection. Note that running the script on the console prints everything when it should. Though doing a tail -f on the logfile is not smooth since it writes about 50 lines at a time. Any suggestions?
It sounds like the shell is actually what's doing the buffering, since you say it outputs as expected to the console when not tee'd.
You could look at this post for potential solutions to undo that shell buffering: https://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe
But I would recommend doing it entirely within Python, so you have more direct control, and instead of printing to stdout, use the logging module.
This would allow additional flexibility in terms of multiple logging levels, the ability to add multiple sources to the logging object centrally (i.e. stdout and a file -- and one which rotates with size if you'd like with logging.handlers.RotatingFileHandler) and you wouldn't be subject to the external buffering of the shell.
More info: https://docs.python.org/2/howto/logging.html
I am writing a program in which I would like to be able to view a log file before the program is complete. I have noticed that, in python (2.7 and 3), that file.write() does not save the file, file.close() does. I don't want to create a million little log files with unique names but I would like to be able to view the updated log file before the program is finished. How can I do this?
Now, to be clear I am scripting using Ansys Workbench (trying to batch some CFX runs). Here's a link to a tutorial that shows what I'm talking about. They appear to have wrapped python, and by running the script I can send commands to the various modules. When the script is running there is no console onscreen and it appears to be eating all of the print statements, so the only way I can report what's happening is via a file. Also, I don't want to bring a console window up because eventually I will just run the program in batch mode (no interface). But the simulations take a long time to run and I can't wait for the program to finish before checking on what's happening.
You would need this:
file.flush()
# typically the above line would do. however this is used to ensure that the file is written
os.fsync(file.fileno())
Check this: http://docs.python.org/2/library/stdtypes.html#file.flush
file.flush()
Flush the internal buffer, like stdio‘s fflush(). This may be a no-op on some file-like objects.
Note flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.
EDITED: See this question for detailed explanations: what exactly the python's file.flush() is doing?
Does file.flush() after each write help?
Hannu
This will write the file to disk immediately:
file.flush()
os.fsync(file.fileno())
According to the documentation https://docs.python.org/2/library/os.html#os.fsync
Force write of file with filedescriptor fd to disk. On Unix, this calls the native fsync() function; on Windows, the MS _commit() function.
If you’re starting with a Python file object f, first do f.flush(), and then do os.fsync(f.fileno()), to ensure that all internal buffers associated with f are written to disk.
Okay, so, I just have a quick question regarding python and linux.
I have a program that collects and outputs data to stdout indefinitely. I need to parse this data, and I have a python program I wrote that will do just that. However, I cannot save this data to a file first, as it produces far too much output to save to disk. Is there any way to use redirects to somehow pipe this output into the program?
Example:
python parser.py < ./dataCollector.sh
Close, but you want an actual pipe not a shell redirect:
./dataCollector.sh | python parser.py
I have a simple command-line utility which produces output both on the console and the filesystem. While I know very well how to capture the console output, I am not aware how can I also intercept the file - for which I know the filename in advance.
I would like to keep the execution "in memory" without touching the filesystem as I immediately parse and delete the file created and this creates an unnecessary bottleneck (especially when I need to run the little tool millions of times).
So, to sum up, I am trying to achieve following:
Run a binary using python's subprocess
Capture both the tool's output AND contents of a file it creates (in current working directory with in-advance known name)
Ideally, run it all without touching the filesystem.
Since you only need to support Linux, one possibility is to use named pipes. The idea is to pre-create the output file as a named pipe, and have your process read the tool's output from the pipe.
See, for example, Introduction to Named Pipes.
The Python API is os.mkfifo().
i have tests that i ran which can take up to 15m at a time. during these 15m, a log file is periodically written to. however, most of the content is useless.
in response to this i have a python script that parses out the useless text and displays the relevant data.
what i'm trying to achieve is similar to what tail -f log_file, constantly updating the terminal with the newest additions to a file. i was thinking that if a python script ran as a process, it could parse the log file whenever the tests write to it, then the python script can go to sleep until interrupted again once the log file is written to.
any ideas how one can achieve this?
i already have a script that does the parsing, i just don't know how to make it do it continually and efficiently.
You could just have the script filter standard input, and pipe tail -f through it. When you're waiting on stdin, your script will sleep, so it's plenty efficient.
Eg.
python long_running_script.py && tail -f log_file | python filter_logs.py
Your script can be something like
while true:
line = sys.stdin.readline()
if filter_line(line): print line
looks like you need something like "pytailer":
http://code.google.com/p/pytailer/
While I never used it myself, last example looks like what you want.
any ideas how one can achieve this?
This should be pretty easy to do. Most of what you want is already part of your OS.
python test.py | python log_parser.py
Be sure your tests write their log to stdout instead of some other file. This is often easy to do with small changes to the logging configuration.
Having implemented almost this exact tool, I had great success using the inotify capability in twisted