running a python script indefinitely (as a process, pretty much)

running a python script indefinitely (as a process, pretty much) - python

i have tests that i ran which can take up to 15m at a time. during these 15m, a log file is periodically written to. however, most of the content is useless.
in response to this i have a python script that parses out the useless text and displays the relevant data.
what i'm trying to achieve is similar to what tail -f log_file, constantly updating the terminal with the newest additions to a file. i was thinking that if a python script ran as a process, it could parse the log file whenever the tests write to it, then the python script can go to sleep until interrupted again once the log file is written to.
any ideas how one can achieve this?
i already have a script that does the parsing, i just don't know how to make it do it continually and efficiently.

You could just have the script filter standard input, and pipe tail -f through it. When you're waiting on stdin, your script will sleep, so it's plenty efficient.
Eg.
python long_running_script.py && tail -f log_file | python filter_logs.py
Your script can be something like
while true:
line = sys.stdin.readline()
if filter_line(line): print line

looks like you need something like "pytailer":
http://code.google.com/p/pytailer/
While I never used it myself, last example looks like what you want.

any ideas how one can achieve this?
This should be pretty easy to do. Most of what you want is already part of your OS.
python test.py | python log_parser.py
Be sure your tests write their log to stdout instead of some other file. This is often easy to do with small changes to the logging configuration.

Having implemented almost this exact tool, I had great success using the inotify capability in twisted

Related

Python3 curses code not working in pipeline

I am writing a python script that I want to use in a unix pipeline. My goal is to write to the screen using curses (which should only be seen by the person running the command, not the pipe), and then write the "return value" to stdout at the end so it can continue down the pipeline, something along the lines of ./myscript.py | consumer_script
This was failing in mysterious ways until I found This. The suggested solution was to use newterm instead of init_scr.
My problem is that I am using python, and from what I could find in the documentation, newterm doesnt exist. All I was able to find was a single reference to newterm, and it didn't come with a link.
Could someone please either point me towards the python newterm, or suggest another way of working with pipes and curses.

I think you're making this more complicated than it needs to be... the simple answer is to write the curses stream to another handle than stdout. If it works for you, stderr is the obvious choice. In short, anything that gets written to stdout goes into the pipeline, and if you don't want it there, you need a different handle.
Check out this thread for ways to write to stderr in python:
How to print to stderr in Python?

Displaying all results of execution

When I execute a python program, the results starts to appear quickly and I can't read it all. It just flushes over my screen.
When the execution ends, I can no longer see the first displays, because the terminal display space is limited.
How save the output, so I can read all of it?

You have a few options here.
Add a breakpoint and learn how to use the debugger. Once you add this command (import pdb;pdb.set_trace() # this will take some learning so look up what pdb is online. actually, i prefer 'ipdb' instead.), the code will stop at that specific point when you execute it.
Save it to a file (python file.py > filename.txt) and then read it afterwards. Bonus: Before you ask yourself, where are my outputs? https://askubuntu.com/questions/625224/how-to-redirect-stderr-to-a-file
(More advanced) Your code is spitting out too much garbage output. You can remove some of the code or use python logging filters.

May be platform dependant.
On Linux you can also pipe your program output into your favorite pager (less for example) if you don't want to write it to a file.
python file.py | less

Write and save a file with nano using subprocess

how can I write/append to a file by calling nano using subprocess and get it saved automatically .For example I have a file and I want to open it and append something at the end of it so I write
>>> import tempfile
>>> file = tempfile.NamedTemporaryFile(mode='a')
>>> example = file.name
>>> f.close()
>>> import subprocess
>>> subprocess.call(['nano', example])
Now once the last line gets executed the file gets open and I can write anything and then save it by hitting Ctrl+O and Ctrl+X
Instead I want that I send the input through a stdin PIPE and and the file gets saved by itself ie there could be any mechanism that hits Ctrl+O and Ctrl+X automayically by itself ?
Can help me in solving this issue ?

A ctrl-O is just a character, same as any other. You can send it by writing '\x0f' (or, in Python 3, b'\x0f').
However, that probably isn't going to do you any good. Most programs that provide an interactive GUI in the terminal, like nano, cannot be driven by stdin. They need to take control of the terminal, and to do that, they will either check that stdin isatty and then tcsetattr it, or just open /dev/tty,
You can deal with this by creating a pseudo-terminal with os.openpty, os.forkpty, or pty.
But it's often easier to use a library like pexpect to deal with interactive programs, GUI or otherwise.
And it's even easier to not try to drive an interactive program in the first place. For example, unlike nano, ed is designed to be driven in "batch mode" by a script, and sed even more so.
And it's even easier to not try to drive a program at all when you're trying to do something that can be just as easily done directly in Python. The easiest way to append something to a file is to open it in 'a' mode and write to it. No need for an external program at all. For example:
new_line = input('What do you want to add?')
with open(fname, 'a') as f:
f.write(new_line)
If the only reason you were using nano is because you needed something to sudo… there's really no reason for that. You can sudo anything else—like sed, or another Python script—just as easily. Using nano is just making things harder for yourself for absolutely no reason.
The big question here is: why do you have a file that's not writable by your Python script, but which you want arbitrary remote users to be able to append to? That sounds like a very bad system design. You make files non-writable because you want to restrict normal users from modifying them; if you want your Python script to be able to modify it on behalf of your remote users, why isn't it owned by the same user that the script runs as?

In the (unlikely) event that you still find that you need to control nano or some other interactive program from a Python process, I'm going to suggest the same thing here that I suggested for this question: Using python subprocess.call() to launch an ncurses process ...
... don't use subprocess for controlling curses/full-screen interactive processes. use pexpect. That's what it's for.
(On the other hand I also agree with the many comments here regarding better ways to work around the permissions issue. Write some sort of script (in Python, bash, sed or whatever) which can be run under sudo and which can make the in-place edits or appendices to your data file directly.

Using Python in a Linux Terminal

Okay, so, I just have a quick question regarding python and linux.
I have a program that collects and outputs data to stdout indefinitely. I need to parse this data, and I have a python program I wrote that will do just that. However, I cannot save this data to a file first, as it produces far too much output to save to disk. Is there any way to use redirects to somehow pipe this output into the program?
Example:
python parser.py < ./dataCollector.sh

Close, but you want an actual pipe not a shell redirect:
./dataCollector.sh | python parser.py

Grabbing output FILE from Python Popen process?

I have written a python program to interface with a compiled program (call it ProgramX) that has some idiosyncrasies that are proving difficult to deal with. I need to feed many thousands of input files to ProgramX via my python program. What I would like to do is to grab the output file that ProgramX creates with each run, and rename it something sensible, like inputfilename.output.
The problem comes in the output file that is written by ProgramX -- it is named via an unpredictable method, which will write, and "mercilessly overwrite", the output file if it already exists (which is the case the majority of the time). The saving grace probably comes with the fact that there is a standard prefix to the output files: think ProgramX.notQuiteRandomNumber.
The only think I can think to do is something like this in my bash shell:
PROGRAMXOUTPUT=$(ls -ltr ProgramX* | tail -n -1 | awk '{print $8}')
mv $PROGRAMXOUTPUT input.output
Which does 90% of what I need, but before I program all that bash into a series of Popen statements, is there a better way to do this? This problem feels like something people might have a much better solution than what I'm thinking.
Sidenote: I can grab the program's standard output without problems, however it's the output file that I need to grab.
Bonus: I was planning on running a bunch of instantiations of the program in the same directory, so my naive approach above may start to have unforeseen problems. So perhaps something fancy that watches the PID of ProgramX and follows its output.

To do what your shell script above does, assuming you've only got one ProgramX* in the current directory:
import glob, os
programxoutput = glob.glob('ProgramX*')[0]
os.rename(programxoutput, 'input.output')
If you need to sort by time, etc., there are ways to do that too (look at os.stat), but using the most recent modification date is a recipe for nasty race conditions if you'll be running multiple copies of ProgramX concurrently.
I'd suggest instead that you create and change to a new, perhaps temporary directory for each run of ProgramX, so the runs have no possibility of treading on each other. The tempfile module can help with this.

Two options that I see:
You could use lsof to find open files to find the files that ProgramX is writing.
A different approach would be to run ProgramX in a temporary directory (see tempfile for an easy way of setting up directories. Between runs of ProgramX, you can clean that directory or keep requesting new temp directories, if you are planning on running multiple copieProgramX at the same time.

If there is only one ProgramX* file, then what about just:
mv ProgramX* input.output

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.