Processing realtime output of access_log with subprocess

Processing realtime output of access_log with subprocess - python

I am trying to accomplish an idea which i really don't know how to do it. Basically i am trying to capture the value through grep command as shown below
p = subprocess.Popen('tail -f /data/qantasflight/run/tomcat/logs/localhost_access_log.2016-02-29.txt | grep /qantas-ui/int/price?', stdout=subprocess.PIPE, shell = True)
stdout = p.communicate()[0]
Process the stdout value and then push the value as shown below
f = urllib.urlopen("http://162.16.1.90:9140/TxnService", params2)
param2 is the value where i will process the result given by the subprocess.Popen
In a nutshell i want the following :
-- Wait for the new value --> -- Process the value --> -- Push the value -->
This should be realtime and the python script will keep on getting the new value, process it and then push the value.

i try code somethink as you want, but im only printing new lines in log, you can process line and push
from __future__ import print_function
import subprocess
from time import sleep
f = open('test.log', 'r')
while True:
line = ''
while len(line) == 0 or line[-1] != '\n':
tail = f.readline()
if tail == '':
continue
line = tail
print(line, end='')
this print new line to console, just edit and use :) maybe i help you :)

Related

Log output of background process to a file

I have time consuming SNMP walk task to perform which I am running as a background process using Popen command. How can I capture the output of this background task in a log file. In the below code, I am trying to do snampwalk on each IP in ip_list and logging all the results to abc.txt. However, I see the generated file abc.txt is empty.
Here is my sample code below -
import subprocess
import sys
f = open('abc.txt', 'a+')
ip_list = ["192.163.1.104", "192.163.1.103", "192.163.1.101"]
for ip in ip_list:
cmd = "snmpwalk.exe -t 1 -v2c -c public "
cmd = cmd + ip
print(cmd)
p = subprocess.Popen(cmd, shell=True, stdout=f)
p.wait()
f.close()
print("File output - " + open('abc.txt', 'r').read())
the sample output from the command can be something like this for each IP -
sysDescr.0 = STRING: Software: Whistler Version 5.1 Service Pack 2 (Build 2600)
sysObjectID.0 = OID: win32
sysUpTimeInstance = Timeticks: (15535) 0:02:35.35
sysContact.0 = STRING: unknown
sysName.0 = STRING: UDLDEV
sysLocation.0 = STRING: unknown
sysServices.0 = INTEGER: 72
sysORID.4 = OID: snmpMPDCompliance
I have already tried Popen. But it does not logs output to a file if it is a time consuming background process. However, it works when I try to run background process like ls/dir. Any help is appreciated.

The main issue here is the expectation of what Popen does and how it works I assume.
p.wait() here will wait for the process to finish before continuing, that is why ls for instance works but more time consuming tasks doesn't. And there's nothing flushing the output automatically until you call p.stdout.flush().
The way you've set it up is more meant to work for:
Execute command
Wait for exit
Catch output
And then work with it. For your usecase, you'd better off using an alternative library or use the stdout=subprocess.PIPE and catch it yourself. Which would mean something along the lines of:
import subprocess
import sys
ip_list = ["192.163.1.104", "192.163.1.103", "192.163.1.101"]
with open('abc.txt', 'a+') as output:
for ip in ip_list:
print(cmd := f"snmpwalk.exe -t 1 -v2c -c public {ip}")
process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE) # Be wary of shell=True
while process.poll() is None:
for c in iter(lambda: process.stdout.read(1), ''):
if c != '':
output.write(c)
with open('abc.txt', 'r') as log:
print("File output: " + log.read())
The key things to take away here is process.poll() which checks if the process has finished, if not, we'll try to catch the output with process.stdout.read(1) to read one byte at a time. If you know there's new lines coming, you can switch those three lines to output.write(process.stdout.readline()) and you're all set.

My python script stops... no error, just stops

I am running a script that iterates through a text file. On each line on the text file there is an ip adress. The script grabs the banner, then writes the ip + banner on another file.
The problem is, it just stops around 500 lines, more or less, with no error.
Another weird thing is if i run it with python3 it does what i said above. If i run it with python it iterates through those 500 lines, then starts at the beggining. I noticed this when i saw repetitions in my output file. Anyway here is the code, maybe you guys can tell me what im doing wrong:
import os
import subprocess
import concurrent.futures
#import time, random
import threading
import multiprocessing
with open("ipuri666.txt") as f:
def multiprocessing_func():
try:
line2 = line.rstrip('\r\n')
a = subprocess.Popen(["curl", "-I", line2, "--connect-timeout", "1", "--max-time", "1"], stdout=subprocess.PIPE)
b = subprocess.Popen(["grep", "Server"], stdin=a.stdout, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
#a.stdout.close()
out, err = b.communicate()
g = open("IP_BANNER2","a")
print( "out: {0}".format(out))
g.write(line2 + " " + "out: {0}\n".format(out))
print("err: {0}".format(err))
except IOError:
print("Connection timed out")
if __name__ == '__main__':
#starttime = time.time()
processes = []
for line in f:
p = multiprocessing.Process(target=multiprocessing_func, args=())
processes.append(p)
p.start()
for process in processes:
process.join()

If your use case allows I would recommend just rewriting this as a shell script, there is no need to use Python. (This would likely solve your issue indirectly.)
#!/usr/bin/env bash
readarray -t ips < ipuri666.txt
for ip in ${ips[#]}; do
output=$(curl -I "$ip" --connect-timeout 1 --max-time 1 | grep "Server")
echo "$ip $output" >> fisier.txt
done
The script is slightly simpler than what you are trying to do, for instance I do not capture the error. This should be pretty close to what you are trying to accomplish. I will update again if needed.

python: tail file in background [duplicate]

I'd like to make the output of tail -F or something similar available to me in Python without blocking or locking. I've found some really old code to do that here, but I'm thinking there must be a better way or a library to do the same thing by now. Anyone know of one?
Ideally, I'd have something like tail.getNewData() that I could call every time I wanted more data.

Non Blocking
If you are on linux (as windows does not support calling select on files) you can use the subprocess module along with the select module.
import time
import subprocess
import select
f = subprocess.Popen(['tail','-F',filename],\
stdout=subprocess.PIPE,stderr=subprocess.PIPE)
p = select.poll()
p.register(f.stdout)
while True:
if p.poll(1):
print f.stdout.readline()
time.sleep(1)
This polls the output pipe for new data and prints it when it is available. Normally the time.sleep(1) and print f.stdout.readline() would be replaced with useful code.
Blocking
You can use the subprocess module without the extra select module calls.
import subprocess
f = subprocess.Popen(['tail','-F',filename],\
stdout=subprocess.PIPE,stderr=subprocess.PIPE)
while True:
line = f.stdout.readline()
print line
This will also print new lines as they are added, but it will block until the tail program is closed, probably with f.kill().

Using the sh module (pip install sh):
from sh import tail
# runs forever
for line in tail("-f", "/var/log/some_log_file.log", _iter=True):
print(line)
[update]
Since sh.tail with _iter=True is a generator, you can:
import sh
tail = sh.tail("-f", "/var/log/some_log_file.log", _iter=True)
Then you can "getNewData" with:
new_data = tail.next()
Note that if the tail buffer is empty, it will block until there is more data (from your question it is not clear what you want to do in this case).
[update]
This works if you replace -f with -F, but in Python it would be locking. I'd be more interested in having a function I could call to get new data when I want it, if that's possible. – Eli
A container generator placing the tail call inside a while True loop and catching eventual I/O exceptions will have almost the same effect of -F.
def tail_F(some_file):
while True:
try:
for line in sh.tail("-f", some_file, _iter=True):
yield line
except sh.ErrorReturnCode_1:
yield None
If the file becomes inaccessible, the generator will return None. However it still blocks until there is new data if the file is accessible. It remains unclear for me what you want to do in this case.
Raymond Hettinger approach seems pretty good:
def tail_F(some_file):
first_call = True
while True:
try:
with open(some_file) as input:
if first_call:
input.seek(0, 2)
first_call = False
latest_data = input.read()
while True:
if '\n' not in latest_data:
latest_data += input.read()
if '\n' not in latest_data:
yield ''
if not os.path.isfile(some_file):
break
continue
latest_lines = latest_data.split('\n')
if latest_data[-1] != '\n':
latest_data = latest_lines[-1]
else:
latest_data = input.read()
for line in latest_lines[:-1]:
yield line + '\n'
except IOError:
yield ''
This generator will return '' if the file becomes inaccessible or if there is no new data.
[update]
The second to last answer circles around to the top of the file it seems whenever it runs out of data. – Eli
I think the second will output the last ten lines whenever the tail process ends, which with -f is whenever there is an I/O error. The tail --follow --retry behavior is not far from this for most cases I can think of in unix-like environments.
Perhaps if you update your question to explain what is your real goal (the reason why you want to mimic tail --retry), you will get a better answer.
The last answer does not actually follow the tail and merely reads what's available at run time. – Eli
Of course, tail will display the last 10 lines by default... You can position the file pointer at the end of the file using file.seek, I will left a proper implementation as an exercise to the reader.
IMHO the file.read() approach is far more elegant than a subprocess based solution.

Purely pythonic solution using non-blocking readline()
I am adapting Ijaz Ahmad Khan's answer to only yield lines when they are completely written (lines end with a newline char) gives a pythonic solution with no external dependencies:
import time
from typing import Iterator
def follow(file, sleep_sec=0.1) -> Iterator[str]:
""" Yield each line from a file as they are written.
`sleep_sec` is the time to sleep after empty reads. """
line = ''
while True:
tmp = file.readline()
if tmp is not None:
line += tmp
if line.endswith("\n"):
yield line
line = ''
elif sleep_sec:
time.sleep(sleep_sec)
if __name__ == '__main__':
with open("test.txt", 'r') as file:
for line in follow(file):
print(line, end='')

The only portable way to tail -f a file appears to be, in fact, to read from it and retry (after a sleep) if the read returns 0. The tail utilities on various platforms use platform-specific tricks (e.g. kqueue on BSD) to efficiently tail a file forever without needing sleep.
Therefore, implementing a good tail -f purely in Python is probably not a good idea, since you would have to use the least-common-denominator implementation (without resorting to platform-specific hacks). Using a simple subprocess to open tail -f and iterating through the lines in a separate thread, you can easily implement a non-blocking tail operation in Python.
Example implementation:
import threading, Queue, subprocess
tailq = Queue.Queue(maxsize=10) # buffer at most 100 lines
def tail_forever(fn):
p = subprocess.Popen(["tail", "-f", fn], stdout=subprocess.PIPE)
while 1:
line = p.stdout.readline()
tailq.put(line)
if not line:
break
threading.Thread(target=tail_forever, args=(fn,)).start()
print tailq.get() # blocks
print tailq.get_nowait() # throws Queue.Empty if there are no lines to read

All the answers that use tail -f are not pythonic.
Here is the pythonic way: ( using no external tool or library)
def follow(thefile):
while True:
line = thefile.readline()
if not line or not line.endswith('\n'):
time.sleep(0.1)
continue
yield line
if __name__ == '__main__':
logfile = open("run/foo/access-log","r")
loglines = follow(logfile)
for line in loglines:
print(line, end='')

So, this is coming quite late, but I ran into the same problem again, and there's a much better solution now. Just use pygtail:
Pygtail reads log file lines that have not been read. It will even
handle log files that have been rotated. Based on logcheck's logtail2
(http://logcheck.org)

Ideally, I'd have something like tail.getNewData() that I could call every time I wanted more data
We've already got one and itsa very nice. Just call f.read() whenever you want more data. It will start reading where the previous read left off and it will read through the end of the data stream:
f = open('somefile.log')
p = 0
while True:
f.seek(p)
latest_data = f.read()
p = f.tell()
if latest_data:
print latest_data
print str(p).center(10).center(80, '=')
For reading line-by-line, use f.readline(). Sometimes, the file being read will end with a partially read line. Handle that case with f.tell() finding the current file position and using f.seek() for moving the file pointer back to the beginning of the incomplete line. See this ActiveState recipe for working code.

You could use the 'tailer' library: https://pypi.python.org/pypi/tailer/
It has an option to get the last few lines:
# Get the last 3 lines of the file
tailer.tail(open('test.txt'), 3)
# ['Line 9', 'Line 10', 'Line 11']
And it can also follow a file:
# Follow the file as it grows
for line in tailer.follow(open('test.txt')):
print line
If one wants tail-like behaviour, that one seems to be a good option.

Another option is the tailhead library that provides both Python versions of of tail and head utilities and API that can be used in your own module.
Originally based on the tailer module, its main advantage is the ability to follow files by path i.e. it can handle situation when file is recreated. Besides, it has some bug fixes for various edge cases.

Python is "batteries included" - it has a nice solution for it: https://pypi.python.org/pypi/pygtail
Reads log file lines that have not been read. Remembers where it finished last time, and continues from there.
import sys
from pygtail import Pygtail
for line in Pygtail("some.log"):
sys.stdout.write(line)

You can also use 'AWK' command.
See more at: http://www.unix.com/shell-programming-scripting/41734-how-print-specific-lines-awk.html
awk can be used to tail last line, last few lines or any line in a file.
This can be called from python.

If you are on linux you implement a non-blocking implementation in python in the following way.
import subprocess
subprocess.call('xterm -title log -hold -e \"tail -f filename\"&', shell=True, executable='/bin/csh')
print "Done"

# -*- coding:utf-8 -*-
import sys
import time
class Tail():
def __init__(self, file_name, callback=sys.stdout.write):
self.file_name = file_name
self.callback = callback
def follow(self, n=10):
try:
# 打开文件
with open(self.file_name, 'r', encoding='UTF-8') as f:
# with open(self.file_name,'rb') as f:
self._file = f
self._file.seek(0, 2)
# 存储文件的字符长度
self.file_length = self._file.tell()
# 打印最后10行
self.showLastLine(n)
# 持续读文件 打印增量
while True:
line = self._file.readline()
if line:
self.callback(line)
time.sleep(1)
except Exception as e:
print('打开文件失败，囧，看看文件是不是不存在，或者权限有问题')
print(e)
def showLastLine(self, n):
# 一行大概100个吧 这个数改成1或者1000都行
len_line = 100
# n默认是10，也可以follow的参数传进来
read_len = len_line * n
# 用last_lines存储最后要处理的内容
while True:
# 如果要读取的1000个字符，大于之前存储的文件长度
# 读完文件，直接break
if read_len > self.file_length:
self._file.seek(0)
last_lines = self._file.read().split('\n')[-n:]
break
# 先读1000个 然后判断1000个字符里换行符的数量
self._file.seek(-read_len, 2)
last_words = self._file.read(read_len)
# count是换行符的数量
count = last_words.count('\n')
if count >= n:
# 换行符数量大于10 很好处理，直接读取
last_lines = last_words.split('\n')[-n:]
break
# 换行符不够10个
else:
# break
# 不够十行
# 如果一个换行符也没有，那么我们就认为一行大概是100个
if count == 0:
len_perline = read_len
# 如果有4个换行符，我们认为每行大概有250个字符
else:
len_perline = read_len / count
# 要读取的长度变为2500，继续重新判断
read_len = len_perline * n
for line in last_lines:
self.callback(line + '\n')
if __name__ == '__main__':
py_tail = Tail('test.txt')
py_tail.follow(1)

A simple tail function from pypi app tailread
You Can use it also via pip install tailread
Recommended for tail access of large files.
from io import BufferedReader
def readlines(bytesio, batch_size=1024, keepends=True, **encoding_kwargs):
'''bytesio: file path or BufferedReader
batch_size: size to be processed
'''
path = None
if isinstance(bytesio, str):
path = bytesio
bytesio = open(path, 'rb')
elif not isinstance(bytesio, BufferedReader):
raise TypeError('The first argument to readlines must be a file path or a BufferedReader')
bytesio.seek(0, 2)
end = bytesio.tell()
buf = b""
for p in reversed(range(0, end, batch_size)):
bytesio.seek(p)
lines = []
remain = min(end-p, batch_size)
while remain > 0:
line = bytesio.readline()[:remain]
lines.append(line)
remain -= len(line)
cut, *parsed = lines
for line in reversed(parsed):
if buf:
line += buf
buf = b""
if encoding_kwargs:
line = line.decode(**encoding_kwargs)
yield from reversed(line.splitlines(keepends))
buf = cut + buf
if path:
bytesio.close()
if encoding_kwargs:
buf = buf.decode(**encoding_kwargs)
yield from reversed(buf.splitlines(keepends))
for line in readlines('access.log', encoding='utf-8', errors='replace'):
print(line)
if 'line 8' in line:
break
# line 11
# line 10
# line 9
# line 8

How can I use Python to pipe stdin/stdout to Perl script

This Python code pipes data through Perl script fine.
import subprocess
kw = {}
kw['executable'] = None
kw['shell'] = True
kw['stdin'] = None
kw['stdout'] = subprocess.PIPE
kw['stderr'] = subprocess.PIPE
args = ' '.join(['/usr/bin/perl','-w','/path/script.perl','<','/path/mydata'])
subproc = subprocess.Popen(args,**kw)
for line in iter(subproc.stdout.readline, ''):
print line.rstrip().decode('UTF-8')
However, it requires that I first to save my buffers to a disk file (/path/mydata). It's cleaner to loop through the data in Python code and pass line-by-line to the subprocess like this:
import subprocess
kw = {}
kw['executable'] = '/usr/bin/perl'
kw['shell'] = False
kw['stderr'] = subprocess.PIPE
kw['stdin'] = subprocess.PIPE
kw['stdout'] = subprocess.PIPE
args = ['-w','/path/script.perl',]
subproc = subprocess.Popen(args,**kw)
f = codecs.open('/path/mydata','r','UTF-8')
for line in f:
subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
print line.strip() ### code hangs after printing this ###
for line in iter(subproc.stdout.readline, ''):
print line.rstrip().decode('UTF-8')
subproc.terminate()
f.close()
The code hangs with the readline after sending the first line to the subprocess. I have other executables that use this exact same code perfectly.
My data files can be quite large (1.5 GB) Is there way to accomplish piping the data without saving to file? I don't want to re-write the perl script for compatibility with other systems.

Your code is blocking at the line:
for line in iter(subproc.stdout.readline, ''):
because the only way this iteration can terminate is when EOF (end-of-file) is reached, which will happen when the subprocess terminates. You don't want to wait till the process terminates, however, you only want to wait till its finished processing the line that was sent to it.
Futhermore, you're encountering issues with buffering as Chris Morgan has already pointed out. Another question on stackoverflow discusses how you can do non-blocking reads with subprocess. I've hacked up a quick and dirty adaptation of the code from that question to your problem:
def enqueue_output(out, queue):
for line in iter(out.readline, ''):
queue.put(line)
out.close()
kw = {}
kw['executable'] = '/usr/bin/perl'
kw['shell'] = False
kw['stderr'] = subprocess.PIPE
kw['stdin'] = subprocess.PIPE
kw['stdout'] = subprocess.PIPE
args = ['-w','/path/script.perl',]
subproc = subprocess.Popen(args, **kw)
f = codecs.open('/path/mydata','r','UTF-8')
q = Queue.Queue()
t = threading.Thread(target = enqueue_output, args = (subproc.stdout, q))
t.daemon = True
t.start()
for line in f:
subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
print "Sent:", line.strip() ### code hangs after printing this ###
try:
line = q.get_nowait()
except Queue.Empty:
pass
else:
print "Received:", line.rstrip().decode('UTF-8')
subproc.terminate()
f.close()
It's quite likely that you'll need to make modifications to this code, but at least it doesn't block.

Thanks srgerg. I had also tried the threading solution. This solution alone, however, always hung. Both my previous code and srgerg's code were missing the final solution, Your tip gave me one last idea.
The final solution writes enough dummy data force the final valid lines from the buffer. To support this, I added code that tracks how many valid lines were written to stdin. The threaded loop opens the output file, saves the data, and breaks when the read lines equal the valid input lines. This solution ensures it reads and writes line-by-line for any size file.
def std_output(stdout,outfile=''):
out = 0
f = codecs.open(outfile,'w','UTF-8')
for line in iter(stdout.readline, ''):
f.write('%s\n'%(line.rstrip().decode('UTF-8')))
out += 1
if i == out: break
stdout.close()
f.close()
outfile = '/path/myout'
infile = '/path/mydata'
subproc = subprocess.Popen(args,**kw)
t = threading.Thread(target=std_output,args=[subproc.stdout,outfile])
t.daemon = True
t.start()
i = 0
f = codecs.open(infile,'r','UTF-8')
for line in f:
subproc.stdin.write('%s\n'%(line.strip().encode('UTF-8')))
i += 1
subproc.stdin.write('%s\n'%(' '*4096)) ### push dummy data ###
f.close()
t.join()
subproc.terminate()

See the warnings mentioned in the manual about using Popen.stdin and Popen.stdout (just above Popen.stdin):
Warning: Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.
I realise that having a gigabyte-and-a-half string in memory all at once isn't very desirable, but using communicate() is a way that will work, while as you've observed, once the OS pipe buffer fills up, the stdin.write() + stdout.read() way can become deadlocked.
Is using communicate() feasible for you?

using pipe with python

import re
import subprocess
sub = subprocess.Popen(['/home/karthik/Downloads/stanford-parser-2011-06- 08/lexparser.csh'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr = subprocess.PIPE)
sub.stdin.write("i am a fan of ac milan which is the best club in the world")
relns = []
while(True):
rel = sub.stdout.readline()
m = re.search("Sentence skipped", rel)
if m != None:
print 'stop'
sys.exit(0)
if rel == '\n':
break
relns.append(rel)
print relns
sub.terminate()
So i want to the stanford parser and using the lexparser.csh to parse this line of text . But when i run this piece of code i am a getting the output of the default of text. The actual text given in is not being parsed. So am i using pipes the right way ?And i've seen in a lot of examples - a '-' is used along with the command . Why is that being used ? Cos when i use that the script just stalls at sub.stdout.readline()

You may need to call flush() on sub.stdin after writing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Processing realtime output of access_log with subprocess - python

Related

Log output of background process to a file

My python script stops... no error, just stops

python: tail file in background [duplicate]

How can I use Python to pipe stdin/stdout to Perl script

using pipe with python

Categories

Resources