I have log files located in:
/mfs/log/scribe/clicklog/*/clicklog_current
which I want to process in realtime with Python, so I created a transform.py file:
tail -f /mfs/log/scribe/clicklog/*/clicklog_current | grep 'pattern' | ./transform.py
in tranform.py:
def process_line(line):
print real_process(line)
the problem is: How can I call process_line everytime there is a new line from stdin?
Whenever redirection or piping happening, the standard input stream will be set to that. So you can directly read from sys.stdin, like this
import sys
for line in sys.stdin:
process_line(line)
If the buffering bites you, you can adjust/disable the input buffering, like mentioned in this answer
Reduce the buffering size:
import os
import sys
for line in os.fdopen(sys.stdin.fileno(), 'r', 100):
process_line(line)
Now it buffers only 100 bytes max.
Disable the buffering:
Quoting the official documentation,
-u
Force stdin, stdout and stderr to be totally unbuffered. On systems where it matters, also put stdin, stdout and stderr in binary mode.
Note that there is internal buffering in file.readlines() and File Objects (for line in sys.stdin) which is not influenced by this option. To work around this, you will want to use file.readline() inside a while 1: loop.
The fileinput library may be able to do what you're looking for.
import fileinput
for line in fileinput.input():
if line == '': pass
process_line(line)
You can get rid of the tail -f part completely by using watchdog and grep by using the re module (although in this case, you don't even need that as your search criteria can be written as a simple membership test).
Here is a simple example (modified from the documentation) that would do what you require:
import sys
import time
from watchdog.observers import Observer
from watchdog.handlers import FileSystemEventHandler
class WatchFiles(FileSystemEventHandler):
def process_file(self, event):
"""
does stuff the file
"""
with open(event.src_path, 'r') as f:
for line in f:
if 'pattern' in line:
do_stuff(line)
def on_modified(self, event):
self.process_file(event)
def on_created(self, event):
self.process_file(event)
if __name__ == "__main__":
path = sys.argv[1] if len(sys.argv) > 1 else '.'
observer = Observer()
observer.schedule(WatchFiles(), path, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
This way, your application is not only more portable but all parts of it are self-contained.
Related
I want to execute this command in Python
grep keyMessage logFile.log > keyMessageFile.log
This is what I done now
from subprocess import call
keyMessage= 'keyMessage'
call(["grep", keyMessage, "logFile.log"])
but I don't know how to add the > keyMessageFile.log part
By the way, the reason why I use grep is because it's much faster than use read file then compare string then write file
#Update
There is the slower python code I write
keyMessage= 'keyMessage'
with open('logFile.log') as f:
for line in f:
with open(keyMessage+ '.txt', 'a') as newFile:
if(keyMessage not in line):
continue
else:
newFile.write(line)
The simplest way to do this (reasonably safely too) is:
from subprocess import check_call
from shlex import quote
check_call('grep %s logFile.log > keyMessageFile.log' % quote(keyMessage), shell=True)
However unless you really need the regex matching capabilities of grep, and you end up reading keyMessageFile.log in your program anyway, I don't think the following would be unreasonably slow:
def read_matching_lines(filename, key):
with open(filename) as fp:
for line in fp:
if key in line:
yield line
for matching_line in read_matching_lines('logFile.log', keyMessage):
print(matching_line)
subprocess.call has a parameter stdout. Pass an file opened for writing to it.
with open("keyMessageFile.log", "w") as o:
keyMessage= 'keyMessage'
call(["grep", keyMessage, "logFile.log"], stdout=o)
subprocess.call is the old API, you should use subprocess.run instead.
For me this works:
import sys
os.system('grep %s logFile.log > keyMessageFile.log' % 'looking string')
I am trying to display a output from system . But, my script produces the result only when I run it two times. Below is the script. Using subprocess.Popen at both the places does not produce any out put and same with subprocess.call.
#!/usr/bin/env python
import subprocess
import re
contr = 0
spofchk='su - dasd -c "java -jar /fisc/dasd/bin/srmclient.jar -spof_chk"'
res22 = subprocess.call("touch /tmp/logfile",shell=True,stdout=subprocess.PIPE)
fp = open("/tmp/logfile","r+")
res6 =subprocess.Popen(spofchk,shell=True,stdout=fp)
fil_list=[]
for line in fp:
line = line.strip()
fil_list.append(line)
fp.close()
for i in fil_list[2:]:
if contr % 2 == 0:
if 'no SPOF' in i:
flag=0
#print(flag)
#print(i)
else:
flag = 1
else:
continue
#Incrementing the counter by 2 so that we will only read line with spof and no SPOF
contr+=2
The child process has its own file descriptor and therefore you may close the file in the parent as soon as the child process is started.
To read the whole child process' output that is redirected to a file, wait until it exits:
import subprocess
with open('logfile', 'wb', 0) as file:
subprocess.check_call(command, stdout=file)
with open('logfile') as file:
# read file here...
If you want to consume the output while the child process is running, use PIPE:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen(command, stdout=PIPE) as process, open('logfile', 'wb') as file:
for line in process.stdout: # read b'\n'-separated lines
# handle line...
# copy it to file
file.write(line)
Here's a version for older Python versions and links to fix other possible issues.
Since subprocess open a new shell , so in first time it is not possible to create the file and the file and write the output of another subprocess at the same time
.. So only solution for this is to use os. System ..
I have the following shell script that I would like to write in Python (of course grep . is actually a much more complex command):
#!/bin/bash
(cat somefile 2>/dev/null || (echo 'somefile not found'; cat logfile)) \
| grep .
I tried this (which lacks an equivalent to cat logfile anyway):
#!/usr/bin/env python
import StringIO
import subprocess
try:
myfile = open('somefile')
except:
myfile = StringIO.StringIO('somefile not found')
subprocess.call(['grep', '.'], stdin = myfile)
But I get the error AttributeError: StringIO instance has no attribute 'fileno'.
I know I should use subprocess.communicate() instead of StringIO to send strings to the grep process, but I don't know how to mix both strings and files.
p = subprocess.Popen(['grep', '...'], stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
output, output_err = p.communicate(myfile.read())
Don't use bare except, it may catch too much. In Python 3:
#!/usr/bin/env python3
from subprocess import check_output
try:
file = open('somefile', 'rb', 0)
except FileNotFoundError:
output = check_output(cmd, input=b'somefile not found')
else:
with file:
output = check_output(cmd, stdin=file)
It works for large files (the file is redirected at the file descriptor level -- no need to load it into the memory).
If you have a file-like object (without a real .fileno()); you could write to the pipe directly using .write() method:
#!/usr/bin/env python3
import io
from shutil import copyfileobj
from subprocess import Popen, PIPE
from threading import Thread
try:
file = open('somefile', 'rb', 0)
except FileNotFoundError:
file = io.BytesIO(b'somefile not found')
def write_input(source, sink):
with source, sink:
copyfileobj(source, sink)
cmd = ['grep', 'o']
with Popen(cmd, stdin=PIPE, stdout=PIPE) as process:
Thread(target=write_input, args=(file, process.stdin), daemon=True).start()
output = process.stdout.read()
The following answer uses shutil as well --which is quite efficient--,
but avoids a running a separate thread, which in turn never ends and goes zombie when the stdin ends (as with the answer from #jfs)
import os
import subprocess
import io
from shutil import copyfileobj
file_exist = os.path.isfile(file)
with open(file) if file_exists else io.StringIO("Some text here ...\n") as string_io:
with subprocess.Popen("cat", stdin=subprocess.PIPE, stdout=subprocess.PIPE, universal_newlines=True) as process:
copyfileobj(string_io, process.stdin)
# the subsequent code is not executed until copyfileobj ends,
# ... but the subprocess is effectively using the input.
process.stdin.close() # close or otherwise won't end
# Do some online processing to process.stdout, for example...
for line in process.stdout:
print(line) # do something
Alternatively to close and parsing, if the output is known to fit in memory:
...
stdout_text , stderr_text = process.communicate()
I want to use os.mkfifo for simple communication between programs. I have a problem with reading from the fifo in a loop.
Consider this toy example, where I have a reader and a writer working with the fifo. I want to be able to run the reader in a loop to read everything that enters the fifo.
# reader.py
import os
import atexit
FIFO = 'json.fifo'
#atexit.register
def cleanup():
try:
os.unlink(FIFO)
except:
pass
def main():
os.mkfifo(FIFO)
with open(FIFO) as fifo:
# for line in fifo: # closes after single reading
# for line in fifo.readlines(): # closes after single reading
while True:
line = fifo.read() # will return empty lines (non-blocking)
print repr(line)
main()
And the writer:
# writer.py
import sys
FIFO = 'json.fifo'
def main():
with open(FIFO, 'a') as fifo:
fifo.write(sys.argv[1])
main()
If I run python reader.py and later python writer.py foo, "foo" will be printed but the fifo will be closed and the reader will exit (or spin inside the while loop). I want reader to stay in the loop, so I can execute the writer many times.
Edit
I use this snippet to handle the issue:
def read_fifo(filename):
while True:
with open(filename) as fifo:
yield fifo.read()
but maybe there is some neater way to handle it, instead of repetitively opening the file...
Related
Getting readline to block on a FIFO
You do not need to reopen the file repeatedly. You can use select to block until data is available.
with open(FIFO_PATH) as fifo:
while True:
select.select([fifo],[],[fifo])
data = fifo.read()
do_work(data)
In this example you won't read EOF.
A FIFO works (on the reader side) exactly this way: it can be read from, until all writers are gone. Then it signals EOF to the reader.
If you want the reader to continue reading, you'll have to open again and read from there. So your snippet is exactly the way to go.
If you have mutliple writers, you'll have to ensure that each data portion written by them is smaller than PIPE_BUF on order not to mix up the messages.
The following methods on the standard library's pathlib.Path class are helpful here:
Path.is_fifo()
Path.read_text/Path.read_bytes
Path.write_text/Path.write_bytes
Here is a demo:
# reader.py
import os
from pathlib import Path
fifo_path = Path("fifo")
os.mkfifo(fifo_path)
while True:
print(fifo_path.read_text()) # blocks until data becomes available
# writer.py
import sys
from pathlib import Path
fifo_path = Path("fifo")
assert fifo_path.is_fifo()
fifo_path.write_text(sys.argv[1])
I use external library, like this:
from some_lib import runThatProgram
infile = '/tmp/test'
outfile = '/tmp/testout'
runThatProgram(infile, outfile)
while runThatProgram is:
def runThatProgram(infile, outfile):
os.system("%s %s > %s" % ('thatProgram', infile, outfile))
The problem is that 'thatProgram' returns lots of stuff on STDERR, I want to redirect it to a file, but I cannot edit runThatProgram code because it is in third party lib!
To illustrate what Rosh Oxymoron said, you can hack the code like this :
from some_lib import runThatProgram
infile = '/tmp/test'
outfile = '/tmp/testout 2>&1'
runThatProgram(infile, outfile)
with this, it will call
thatProgram /tmp/test > /tmp/testout 2>&1
that will redirected stderr (2) to stdout (1), and everything will be logged in your outfile.
To elaborate on using subprocess, you can open it, give it a pipe and then work from there so
import subprocess
program = "runthatprogram.py".split()
process = subprocess.Popen(program, stdout = subprocess.PIPE, stderr = open('stderr','w')) #stderr to fileobj
process.communicate()[0] #display stdout